[RFC,v19,1/5] exec: Add a new AT_CHECK flag to execveat(2)

Message ID	20240704190137.696169-2-mic@digikod.net (mailing list archive)
State	New
Headers	show Received: from smtp-42ab.mail.infomaniak.ch (smtp-42ab.mail.infomaniak.ch [84.16.66.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A6BB213D608; Thu, 4 Jul 2024 19:02:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=84.16.66.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720119739; cv=none; b=Yle/pfJV+2JqAwIpTxMRbH/w1cbrg8E0fHKnPrFpbFSmoOx4FQg3CKleopzwN/CmgKURA/mAvIgoVOhZuOJAnsAqIwYsnfXqo98fm4FpBotJcXvnRokZdF7UZYS6HbIZH+DHL9KHwCM9imyrEGvS5SWqSA6pzdoWMBzjPgzwMsA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720119739; c=relaxed/simple; bh=QkSeR/S7n24t9rwdBxmCu1WbP7aeAIrwOrKa2VVZbjA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=NoxNucjD78ljwxXLfIxVcXSoFFPQVOqClycV+XMpJ3zpw5ke6Mw/dG6EvXN4kQMZIxwtpXWj8xdHvYTOa3Ho32N0REmUJw+D8ZGVXMMifoOzzphBh+CUpM+/OePhfsOp/p04mkxFem1w2qDhOar+Z1SRX/V17z8emeK5kXHWNeo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net; spf=pass smtp.mailfrom=digikod.net; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b=RoJpMmrw; arc=none smtp.client-ip=84.16.66.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=digikod.net Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=digikod.net Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=digikod.net header.i=@digikod.net header.b="RoJpMmrw" Received: from smtp-3-0001.mail.infomaniak.ch (smtp-3-0001.mail.infomaniak.ch [10.4.36.108]) by smtp-4-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4WFQxW5nlpzC3q; Thu, 4 Jul 2024 21:02:07 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=digikod.net; s=20191114; t=1720119727; bh=Y9d96OsDMPb7+yCSisfZWoz+qtMBDpJ4FX9JzX/byVk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=RoJpMmrwajgMIZzj+FDE8d2fMbX6V4zlKWa7FHgUGq82BsioXPo8PzYQfoHIFVUkz TV8A2QFNZM2+q+h0r43ok920fl0wvuPue9FRnVPait38jkHnUbIUXrLRWLEaLimXlR 1rX+c3eFRYDpywr2XT9ozXTEknnIa3ZHKGwnQiAQ= Received: from unknown by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4WFQxR1bl1z3JG; Thu, 4 Jul 2024 21:02:03 +0200 (CEST) From: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= <mic@digikod.net> To: Al Viro <viro@zeniv.linux.org.uk>, Christian Brauner <brauner@kernel.org>, Kees Cook <keescook@chromium.org>, Linus Torvalds <torvalds@linux-foundation.org>, Paul Moore <paul@paul-moore.com>, Theodore Ts'o <tytso@mit.edu> Cc: =?utf-8?q?Micka=C3=ABl_Sala=C3=BCn?= <mic@digikod.net>, Alejandro Colomar <alx.manpages@gmail.com>, Aleksa Sarai <cyphar@cyphar.com>, Andrew Morton <akpm@linux-foundation.org>, Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>, Casey Schaufler <casey@schaufler-ca.com>, Christian Heimes <christian@python.org>, Dmitry Vyukov <dvyukov@google.com>, Eric Biggers <ebiggers@kernel.org>, Eric Chiang <ericchiang@google.com>, Fan Wu <wufan@linux.microsoft.com>, Florian Weimer <fweimer@redhat.com>, Geert Uytterhoeven <geert@linux-m68k.org>, James Morris <jamorris@linux.microsoft.com>, Jan Kara <jack@suse.cz>, Jann Horn <jannh@google.com>, Jeff Xu <jeffxu@google.com>, Jonathan Corbet <corbet@lwn.net>, Jordan R Abrahams <ajordanr@google.com>, Lakshmi Ramasubramanian <nramas@linux.microsoft.com>, Luca Boccassi <bluca@debian.org>, Luis Chamberlain <mcgrof@kernel.org>, "Madhavan T . Venkataraman" <madvenka@linux.microsoft.com>, Matt Bobrowski <mattbobrowski@google.com>, Matthew Garrett <mjg59@srcf.ucam.org>, Matthew Wilcox <willy@infradead.org>, Miklos Szeredi <mszeredi@redhat.com>, Mimi Zohar <zohar@linux.ibm.com>, Nicolas Bouchinet <nicolas.bouchinet@ssi.gouv.fr>, Scott Shell <scottsh@microsoft.com>, Shuah Khan <shuah@kernel.org>, Stephen Rothwell <sfr@canb.auug.org.au>, Steve Dower <steve.dower@python.org>, Steve Grubb <sgrubb@redhat.com>, Thibaut Sautereau <thibaut.sautereau@ssi.gouv.fr>, Vincent Strubel <vincent.strubel@ssi.gouv.fr>, Xiaoming Ni <nixiaoming@huawei.com>, Yin Fengwei <fengwei.yin@intel.com>, kernel-hardening@lists.openwall.com, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-integrity@vger.kernel.org, linux-kernel@vger.kernel.org, linux-security-module@vger.kernel.org Subject: [RFC PATCH v19 1/5] exec: Add a new AT_CHECK flag to execveat(2) Date: Thu, 4 Jul 2024 21:01:33 +0200 Message-ID: <20240704190137.696169-2-mic@digikod.net> In-Reply-To: <20240704190137.696169-1-mic@digikod.net> References: <20240704190137.696169-1-mic@digikod.net> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: <linux-fsdevel.vger.kernel.org> List-Subscribe: <mailto:linux-fsdevel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-fsdevel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Infomaniak-Routing: alpha
Series	Script execution control (was O_MAYEXEC) \| expand [RFC,v19,0/5] Script execution control (was O_MAYEXEC) [RFC,v19,1/5] exec: Add a new AT_CHECK flag to execveat(2) [RFC,v19,2/5] security: Add new SHOULD_EXEC_CHECK and SHOULD_EXEC_RESTRICT securebits [RFC,v19,3/5] selftests/exec: Add tests for AT_CHECK and related securebits [RFC,v19,4/5] selftests/landlock: Add tests for execveat + AT_CHECK [RFC,v19,5/5] samples/should-exec: Add set-should-exec

Mickaël Salaün July 4, 2024, 7:01 p.m. UTC

Add a new AT_CHECK flag to execveat(2) to check if a file would be
allowed for execution.  The main use case is for script interpreters and
dynamic linkers to check execution permission according to the kernel's
security policy. Another use case is to add context to access logs e.g.,
which script (instead of interpreter) accessed a file.  As any
executable code, scripts could also use this check [1].

This is different than faccessat(2) which only checks file access
rights, but not the full context e.g. mount point's noexec, stack limit,
and all potential LSM extra checks (e.g. argv, envp, credentials).
Since the use of AT_CHECK follows the exact kernel semantic as for a
real execution, user space gets the same error codes.

With the information that a script interpreter is about to interpret a
script, an LSM security policy can adjust caller's access rights or log
execution request as for native script execution (e.g. role transition).
This is possible thanks to the call to security_bprm_creds_for_exec().

Because LSMs may only change bprm's credentials, use of AT_CHECK with
current kernel code should not be a security issue (e.g. unexpected role
transition).  LSMs willing to update the caller's credential could now
do so when bprm->is_check is set.  Of course, such policy change should
be in line with the new user space code.

Because AT_CHECK is dedicated to user space interpreters, it doesn't
make sense for the kernel to parse the checked files, look for
interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
if the format is unknown.  Because of that, security_bprm_check() is
never called when AT_CHECK is used.

It should be noted that script interpreters cannot directly use
execveat(2) (without this new AT_CHECK flag) because this could lead to
unexpected behaviors e.g., `python script.sh` could lead to Bash being
executed to interpret the script.  Unlike the kernel, script
interpreters may just interpret the shebang as a simple comment, which
should not change for backward compatibility reasons.

Because scripts or libraries files might not currently have the
executable permission set, or because we might want specific users to be
allowed to run arbitrary scripts, the following patch provides a dynamic
configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
SECBIT_SHOULD_EXEC_RESTRICT securebits.

This is a redesign of the CLIP OS 4's O_MAYEXEC:
https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
This patch has been used for more than a decade with customized script
interpreters.  Some examples can be found here:
https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://docs.python.org/3/library/io.html#io.open_code [1]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net
---

New design since v18:
https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
---
 fs/exec.c                  |  5 +++--
 include/linux/binfmts.h    |  7 ++++++-
 include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++
 kernel/audit.h             |  1 +
 kernel/auditsc.c           |  1 +
 5 files changed, 41 insertions(+), 3 deletions(-)

Kees Cook July 5, 2024, 12:04 a.m. UTC | #1

On Thu, Jul 04, 2024 at 09:01:33PM +0200, Mickaël Salaün wrote:
> Add a new AT_CHECK flag to execveat(2) to check if a file would be
> allowed for execution.  The main use case is for script interpreters and
> dynamic linkers to check execution permission according to the kernel's
> security policy. Another use case is to add context to access logs e.g.,
> which script (instead of interpreter) accessed a file.  As any
> executable code, scripts could also use this check [1].
> 
> This is different than faccessat(2) which only checks file access
> rights, but not the full context e.g. mount point's noexec, stack limit,
> and all potential LSM extra checks (e.g. argv, envp, credentials).
> Since the use of AT_CHECK follows the exact kernel semantic as for a
> real execution, user space gets the same error codes.

Nice! I much prefer this method of going through the exec machinery so
we always have a single code path for these kinds of checks.

> Because AT_CHECK is dedicated to user space interpreters, it doesn't
> make sense for the kernel to parse the checked files, look for
> interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> if the format is unknown.  Because of that, security_bprm_check() is
> never called when AT_CHECK is used.

I'd like some additional comments in the code that reminds us that
access control checks have finished past a certain point.

[...]
> diff --git a/fs/exec.c b/fs/exec.c
> index 40073142288f..ea2a1867afdc 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
>  		.lookup_flags = LOOKUP_FOLLOW,
>  	};
>  
> -	if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> +	if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
>  		return ERR_PTR(-EINVAL);
>  	if (flags & AT_SYMLINK_NOFOLLOW)
>  		open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
[...]
> + * To avoid race conditions leading to time-of-check to time-of-use issues,
> + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
> + * descriptor instead of a path.

I want this enforced by the kernel. Let's not leave trivial ToCToU
foot-guns around. i.e.:

	if ((flags & AT_CHECK) == AT_CHECK && (flags & AT_EMPTY_PATH) == 0)
  		return ERR_PTR(-EBADF);

Mickaël Salaün July 5, 2024, 5:53 p.m. UTC | #2

On Thu, Jul 04, 2024 at 05:04:03PM -0700, Kees Cook wrote:
> On Thu, Jul 04, 2024 at 09:01:33PM +0200, Mickaël Salaün wrote:
> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > allowed for execution.  The main use case is for script interpreters and
> > dynamic linkers to check execution permission according to the kernel's
> > security policy. Another use case is to add context to access logs e.g.,
> > which script (instead of interpreter) accessed a file.  As any
> > executable code, scripts could also use this check [1].
> > 
> > This is different than faccessat(2) which only checks file access
> > rights, but not the full context e.g. mount point's noexec, stack limit,
> > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > real execution, user space gets the same error codes.
> 
> Nice! I much prefer this method of going through the exec machinery so
> we always have a single code path for these kinds of checks.
> 
> > Because AT_CHECK is dedicated to user space interpreters, it doesn't
> > make sense for the kernel to parse the checked files, look for
> > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > if the format is unknown.  Because of that, security_bprm_check() is
> > never called when AT_CHECK is used.
> 
> I'd like some additional comments in the code that reminds us that
> access control checks have finished past a certain point.

Where in the code? Just before the bprm->is_check assignment?

> 
> [...]
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 40073142288f..ea2a1867afdc 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> >  		.lookup_flags = LOOKUP_FOLLOW,
> >  	};
> >  
> > -	if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > +	if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
> >  		return ERR_PTR(-EINVAL);
> >  	if (flags & AT_SYMLINK_NOFOLLOW)
> >  		open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> [...]
> > + * To avoid race conditions leading to time-of-check to time-of-use issues,
> > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
> > + * descriptor instead of a path.
> 
> I want this enforced by the kernel. Let's not leave trivial ToCToU
> foot-guns around. i.e.:
> 
> 	if ((flags & AT_CHECK) == AT_CHECK && (flags & AT_EMPTY_PATH) == 0)
>   		return ERR_PTR(-EBADF);

There are valid use cases relying on pathnames. See Linus's comment:
https://lore.kernel.org/r/CAHk-=whb=XuU=LGKnJWaa7LOYQz9VwHs8SLfgLbT5sf2VAbX1A@mail.gmail.com

Florian Weimer July 5, 2024, 6:03 p.m. UTC | #3

* Mickaël Salaün:

> Add a new AT_CHECK flag to execveat(2) to check if a file would be
> allowed for execution.  The main use case is for script interpreters and
> dynamic linkers to check execution permission according to the kernel's
> security policy. Another use case is to add context to access logs e.g.,
> which script (instead of interpreter) accessed a file.  As any
> executable code, scripts could also use this check [1].

Some distributions no longer set executable bits on most shared objects,
which I assume would interfere with AT_CHECK probing for shared objects.
Removing the executable bit is attractive because of a combination of
two bugs: a binutils wart which until recently always set the entry
point address in the ELF header to zero, and the kernel not checking for
a zero entry point (maybe in combination with an absent program
interpreter) and failing the execve with ELIBEXEC, instead of doing the
execve and then faulting at virtual address zero.  Removing the
executable bit is currently the only way to avoid these confusing
crashes, so I understand the temptation.

Thanks,
Florian

Andy Lutomirski July 6, 2024, 8:52 a.m. UTC | #4

On Fri, Jul 5, 2024 at 3:03 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> Add a new AT_CHECK flag to execveat(2) to check if a file would be
> allowed for execution.  The main use case is for script interpreters and
> dynamic linkers to check execution permission according to the kernel's
> security policy. Another use case is to add context to access logs e.g.,
> which script (instead of interpreter) accessed a file.  As any
> executable code, scripts could also use this check [1].
>

Can you give a worked-out example of how this is useful?

I assume the idea is that a program could open a file, then pass the
fd to execveat() to get the kernel's idea of whether it's permissible
to execute it.  And then the program would interpret the file, which
is morally like executing it.  And there would be a big warning in the
manpage that passing a *path* is subject to a TOCTOU race.

This type of usage will do the wrong thing if LSM policy intends to
lock down the task if the task were to actually exec the file.  I
personally think this is a mis-design (let the program doing the
exec-ing lock itself down, possibly by querying a policy, but having
magic happen on exec seems likely to do the wrong thing more often
that it does the wright thing), but that ship sailed a long time ago.

So maybe what's actually needed is a rather different API: a way to
check *and perform* the security transition for an exec without
actually execing.  This would need to be done NO_NEW_PRIVS style for
reasons that are hopefully obvious, but it would permit:

fd = open(some script);
if (do_exec_transition_without_exec(fd) != 0)
  return;  // don't actually do it

// OK, we may have just lost privileges.  But that's okay, because we
meant to do that.
// Make sure we've munmapped anything sensitive and erased any secrets
from memory,
// and then interpret the script!

I think this would actually be straightforward to implement in the
kernel -- one would need to make sure that all the relevant
no_new_privs checks are looking in the right place (as the task might
not actually have no_new_privs set, but LSM_UNSAFE_NO_NEW_PRIVS would
still be set), but I don't see any reason this would be
insurmountable, nor do I expect there would be any fundamental
problems.

> This is different than faccessat(2) which only checks file access
> rights, but not the full context e.g. mount point's noexec, stack limit,
> and all potential LSM extra checks (e.g. argv, envp, credentials).
> Since the use of AT_CHECK follows the exact kernel semantic as for a
> real execution, user space gets the same error codes.
>
> With the information that a script interpreter is about to interpret a
> script, an LSM security policy can adjust caller's access rights or log
> execution request as for native script execution (e.g. role transition).
> This is possible thanks to the call to security_bprm_creds_for_exec().
>
> Because LSMs may only change bprm's credentials, use of AT_CHECK with
> current kernel code should not be a security issue (e.g. unexpected role
> transition).  LSMs willing to update the caller's credential could now
> do so when bprm->is_check is set.  Of course, such policy change should
> be in line with the new user space code.
>
> Because AT_CHECK is dedicated to user space interpreters, it doesn't
> make sense for the kernel to parse the checked files, look for
> interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> if the format is unknown.  Because of that, security_bprm_check() is
> never called when AT_CHECK is used.
>
> It should be noted that script interpreters cannot directly use
> execveat(2) (without this new AT_CHECK flag) because this could lead to
> unexpected behaviors e.g., `python script.sh` could lead to Bash being
> executed to interpret the script.  Unlike the kernel, script
> interpreters may just interpret the shebang as a simple comment, which
> should not change for backward compatibility reasons.
>
> Because scripts or libraries files might not currently have the
> executable permission set, or because we might want specific users to be
> allowed to run arbitrary scripts, the following patch provides a dynamic
> configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
> SECBIT_SHOULD_EXEC_RESTRICT securebits.

Can you explain what those bits do?  And why they're useful?

>
> This is a redesign of the CLIP OS 4's O_MAYEXEC:
> https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> This patch has been used for more than a decade with customized script
> interpreters.  Some examples can be found here:
> https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC

This one at least returns an fd, so it looks less likely to get
misused in a way that adds a TOCTOU race.

Mickaël Salaün July 6, 2024, 2:55 p.m. UTC | #5

On Fri, Jul 05, 2024 at 08:03:14PM +0200, Florian Weimer wrote:
> * Mickaël Salaün:
> 
> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > allowed for execution.  The main use case is for script interpreters and
> > dynamic linkers to check execution permission according to the kernel's
> > security policy. Another use case is to add context to access logs e.g.,
> > which script (instead of interpreter) accessed a file.  As any
> > executable code, scripts could also use this check [1].
> 
> Some distributions no longer set executable bits on most shared objects,
> which I assume would interfere with AT_CHECK probing for shared objects.

A file without the execute permission is not considered as executable by
the kernel.  The AT_CHECK flag doesn't change this semantic.  Please
note that this is just a check, not a restriction.  See the next patch
for the optional policy enforcement.

Anyway, we need to define the policy, and for Linux this is done with
the file permission bits.  So for systems willing to have a consistent
execution policy, we need to rely on the same bits.

> Removing the executable bit is attractive because of a combination of
> two bugs: a binutils wart which until recently always set the entry
> point address in the ELF header to zero, and the kernel not checking for
> a zero entry point (maybe in combination with an absent program
> interpreter) and failing the execve with ELIBEXEC, instead of doing the
> execve and then faulting at virtual address zero.  Removing the
> executable bit is currently the only way to avoid these confusing
> crashes, so I understand the temptation.

Interesting.  Can you please point to the bug report and the fix?  I
don't see any ELIBEXEC in the kernel.

FYI, AT_CHECK doesn't check the content of the file (unlike a full
execve call).

Anyway, I think we should not design a new kernel interface to work
around a current user space bug.

Florian Weimer July 6, 2024, 3:32 p.m. UTC | #6

* Mickaël Salaün:

> On Fri, Jul 05, 2024 at 08:03:14PM +0200, Florian Weimer wrote:
>> * Mickaël Salaün:
>> 
>> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
>> > allowed for execution.  The main use case is for script interpreters and
>> > dynamic linkers to check execution permission according to the kernel's
>> > security policy. Another use case is to add context to access logs e.g.,
>> > which script (instead of interpreter) accessed a file.  As any
>> > executable code, scripts could also use this check [1].
>> 
>> Some distributions no longer set executable bits on most shared objects,
>> which I assume would interfere with AT_CHECK probing for shared objects.
>
> A file without the execute permission is not considered as executable by
> the kernel.  The AT_CHECK flag doesn't change this semantic.  Please
> note that this is just a check, not a restriction.  See the next patch
> for the optional policy enforcement.
>
> Anyway, we need to define the policy, and for Linux this is done with
> the file permission bits.  So for systems willing to have a consistent
> execution policy, we need to rely on the same bits.

Yes, that makes complete sense.  I just wanted to point out the odd
interaction with the old binutils bug and the (sadly still current)
kernel bug.

>> Removing the executable bit is attractive because of a combination of
>> two bugs: a binutils wart which until recently always set the entry
>> point address in the ELF header to zero, and the kernel not checking for
>> a zero entry point (maybe in combination with an absent program
>> interpreter) and failing the execve with ELIBEXEC, instead of doing the
>> execve and then faulting at virtual address zero.  Removing the
>> executable bit is currently the only way to avoid these confusing
>> crashes, so I understand the temptation.
>
> Interesting.  Can you please point to the bug report and the fix?  I
> don't see any ELIBEXEC in the kernel.

The kernel hasn't been fixed yet.  I do think this should be fixed, so
that distributions can bring back the executable bit.

Thanks,
Florian

Mickaël Salaün July 7, 2024, 9:01 a.m. UTC | #7

On Sat, Jul 06, 2024 at 04:52:42PM +0800, Andy Lutomirski wrote:
> On Fri, Jul 5, 2024 at 3:03 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > allowed for execution.  The main use case is for script interpreters and
> > dynamic linkers to check execution permission according to the kernel's
> > security policy. Another use case is to add context to access logs e.g.,
> > which script (instead of interpreter) accessed a file.  As any
> > executable code, scripts could also use this check [1].
> >
> 
> Can you give a worked-out example of how this is useful?

Which part?  Please take a look at CLIP OS, chromeOS, and PEP 578 use
cases and related code (see cover letter).

> 
> I assume the idea is that a program could open a file, then pass the
> fd to execveat() to get the kernel's idea of whether it's permissible
> to execute it.  And then the program would interpret the file, which
> is morally like executing it.  And there would be a big warning in the
> manpage that passing a *path* is subject to a TOCTOU race.

yes

> 
> This type of usage will do the wrong thing if LSM policy intends to
> lock down the task if the task were to actually exec the file.  I

Why? LSMs should currently only change the bprm's credentials not the
current's credentials.  If needed, we can extend the current patch
series with LSM specific patches for them to check bprm->is_check.

> personally think this is a mis-design (let the program doing the
> exec-ing lock itself down, possibly by querying a policy, but having
> magic happen on exec seems likely to do the wrong thing more often
> that it does the wright thing), but that ship sailed a long time ago.

The execveat+AT_CHECK is only a check that doesn't impact the caller.
Maybe you're talking about process transition with future LSM changes?
In this case, we could add another flag, but I'm convinced it would be
confusing for users.  Anyway, let LSMs experiment with that and we'll
come up with a new flag if needed.  The current approach is a good and
useful piece to fill a gap in Linux access control systems.

> 
> So maybe what's actually needed is a rather different API: a way to
> check *and perform* the security transition for an exec without
> actually execing.  This would need to be done NO_NEW_PRIVS style for
> reasons that are hopefully obvious, but it would permit:

NO_NEW_PRIVS is not that obvious in this case because the restrictions
are enforced by user space, not the kernel.  NO_NEW_PRIVS makes sense to
avoid kernel restrictions be requested by a malicious/unprivileged
process to change the behavior of a (child) privileged/trusted process.
We are not in this configuration here.  The only change would be for
ptrace, which is a good thing either way and should not harm SUID
processes but avoid confused deputy attack for them too.

If this is about an LSM changing the caller's credentials, then yes it
might want to set additional flags, but that would be specific to their
implementation, not part of this patch.

> 
> fd = open(some script);
> if (do_exec_transition_without_exec(fd) != 0)
>   return;  // don't actually do it
> 
> // OK, we may have just lost privileges.  But that's okay, because we
> meant to do that.
> // Make sure we've munmapped anything sensitive and erased any secrets
> from memory,
> // and then interpret the script!
> 
> I think this would actually be straightforward to implement in the
> kernel -- one would need to make sure that all the relevant
> no_new_privs checks are looking in the right place (as the task might
> not actually have no_new_privs set, but LSM_UNSAFE_NO_NEW_PRIVS would
> still be set), but I don't see any reason this would be
> insurmountable, nor do I expect there would be any fundamental
> problems.

OK, that's what is described below with security_bprm_creds_for_exec().
Each LSM can implement this change with the current patch series, but
that should be part of a dedicated patch series per LSM, for those
willing to leverage this new feature.

> 
> 
> > This is different than faccessat(2) which only checks file access
> > rights, but not the full context e.g. mount point's noexec, stack limit,
> > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > real execution, user space gets the same error codes.
> >
> > With the information that a script interpreter is about to interpret a
> > script, an LSM security policy can adjust caller's access rights or log
> > execution request as for native script execution (e.g. role transition).
> > This is possible thanks to the call to security_bprm_creds_for_exec().
> >
> > Because LSMs may only change bprm's credentials, use of AT_CHECK with
> > current kernel code should not be a security issue (e.g. unexpected role
> > transition).  LSMs willing to update the caller's credential could now
> > do so when bprm->is_check is set.  Of course, such policy change should
> > be in line with the new user space code.
> >
> > Because AT_CHECK is dedicated to user space interpreters, it doesn't
> > make sense for the kernel to parse the checked files, look for
> > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > if the format is unknown.  Because of that, security_bprm_check() is
> > never called when AT_CHECK is used.
> >
> > It should be noted that script interpreters cannot directly use
> > execveat(2) (without this new AT_CHECK flag) because this could lead to
> > unexpected behaviors e.g., `python script.sh` could lead to Bash being
> > executed to interpret the script.  Unlike the kernel, script
> > interpreters may just interpret the shebang as a simple comment, which
> > should not change for backward compatibility reasons.
> >
> > Because scripts or libraries files might not currently have the
> > executable permission set, or because we might want specific users to be
> > allowed to run arbitrary scripts, the following patch provides a dynamic
> > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
> > SECBIT_SHOULD_EXEC_RESTRICT securebits.
> 
> Can you explain what those bits do?  And why they're useful?

I didn't want to duplicate the comments above their definition
explaining their usage.  Please let me know if it's not enough.

> 
> >
> > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > This patch has been used for more than a decade with customized script
> > interpreters.  Some examples can be found here:
> > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> 
> This one at least returns an fd, so it looks less likely to get
> misused in a way that adds a TOCTOU race.

We can use both an FD or a path name with execveat(2).  See discussion
with Kees and comment from Linus.

Mickaël Salaün July 8, 2024, 8:56 a.m. UTC | #8

On Sat, Jul 06, 2024 at 05:32:12PM +0200, Florian Weimer wrote:
> * Mickaël Salaün:
> 
> > On Fri, Jul 05, 2024 at 08:03:14PM +0200, Florian Weimer wrote:
> >> * Mickaël Salaün:
> >> 
> >> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> >> > allowed for execution.  The main use case is for script interpreters and
> >> > dynamic linkers to check execution permission according to the kernel's
> >> > security policy. Another use case is to add context to access logs e.g.,
> >> > which script (instead of interpreter) accessed a file.  As any
> >> > executable code, scripts could also use this check [1].
> >> 
> >> Some distributions no longer set executable bits on most shared objects,
> >> which I assume would interfere with AT_CHECK probing for shared objects.
> >
> > A file without the execute permission is not considered as executable by
> > the kernel.  The AT_CHECK flag doesn't change this semantic.  Please
> > note that this is just a check, not a restriction.  See the next patch
> > for the optional policy enforcement.
> >
> > Anyway, we need to define the policy, and for Linux this is done with
> > the file permission bits.  So for systems willing to have a consistent
> > execution policy, we need to rely on the same bits.
> 
> Yes, that makes complete sense.  I just wanted to point out the odd
> interaction with the old binutils bug and the (sadly still current)
> kernel bug.
> 
> >> Removing the executable bit is attractive because of a combination of
> >> two bugs: a binutils wart which until recently always set the entry
> >> point address in the ELF header to zero, and the kernel not checking for
> >> a zero entry point (maybe in combination with an absent program
> >> interpreter) and failing the execve with ELIBEXEC, instead of doing the
> >> execve and then faulting at virtual address zero.  Removing the
> >> executable bit is currently the only way to avoid these confusing
> >> crashes, so I understand the temptation.
> >
> > Interesting.  Can you please point to the bug report and the fix?  I
> > don't see any ELIBEXEC in the kernel.
> 
> The kernel hasn't been fixed yet.  I do think this should be fixed, so
> that distributions can bring back the executable bit.

Can you please point to the mailing list discussion or the bug report?

Jeff Xu July 8, 2024, 4:08 p.m. UTC | #9

Hi

On Fri, Jul 5, 2024 at 11:03 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Mickaël Salaün:
>
> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > allowed for execution.  The main use case is for script interpreters and
> > dynamic linkers to check execution permission according to the kernel's
> > security policy. Another use case is to add context to access logs e.g.,
> > which script (instead of interpreter) accessed a file.  As any
> > executable code, scripts could also use this check [1].
>
> Some distributions no longer set executable bits on most shared objects,
> which I assume would interfere with AT_CHECK probing for shared objects.
> Removing the executable bit is attractive because of a combination of
> two bugs: a binutils wart which until recently always set the entry
> point address in the ELF header to zero, and the kernel not checking for
> a zero entry point (maybe in combination with an absent program
> interpreter) and failing the execve with ELIBEXEC, instead of doing the
> execve and then faulting at virtual address zero.  Removing the
> executable bit is currently the only way to avoid these confusing
> crashes, so I understand the temptation.
>
Will dynamic linkers use the execveat(AT_CHECK) to check shared
libraries too ?  or just the main executable itself.

Thanks.
-Jeff


> Thanks,
> Florian
>

Florian Weimer July 8, 2024, 4:25 p.m. UTC | #10

* Jeff Xu:

> Will dynamic linkers use the execveat(AT_CHECK) to check shared
> libraries too ?  or just the main executable itself.

I expect that dynamic linkers will have to do this for everything they
map.  Usually, that does not include the maim program, but this can
happen with explicit loader invocations (“ld.so /bin/true”).

Thanks,
Florian

Jeff Xu July 8, 2024, 4:40 p.m. UTC | #11

On Mon, Jul 8, 2024 at 9:26 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Jeff Xu:
>
> > Will dynamic linkers use the execveat(AT_CHECK) to check shared
> > libraries too ?  or just the main executable itself.
>
> I expect that dynamic linkers will have to do this for everything they
> map.
Then all the objects (.so, .sh, etc.) will go through  the check from
execveat's main  to security_bprm_creds_for_exec(), some of them might
be specific for the main executable ?
e.g. ChromeOS uses security_bprm_creds_for_exec to block executable
memfd [1], applying this means automatically extending the block to
the .so object.

I'm not sure if other LSMs need to be updated ?  e.g.  will  SELINUX
check for .so with its process transaction policy ?

[1] https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/3834992

-Jeff


> Usually, that does not include the maim program, but this can
> happen with explicit loader invocations (“ld.so /bin/true”).
>
> Thanks,
> Florian
>

Mickaël Salaün July 8, 2024, 5:05 p.m. UTC | #12

On Mon, Jul 08, 2024 at 09:40:45AM -0700, Jeff Xu wrote:
> On Mon, Jul 8, 2024 at 9:26 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * Jeff Xu:
> >
> > > Will dynamic linkers use the execveat(AT_CHECK) to check shared
> > > libraries too ?  or just the main executable itself.
> >
> > I expect that dynamic linkers will have to do this for everything they
> > map.

Correct, that would enable to safely handle LD_PRELOAD for instance.

> Then all the objects (.so, .sh, etc.) will go through  the check from
> execveat's main  to security_bprm_creds_for_exec(), some of them might
> be specific for the main executable ?
> e.g. ChromeOS uses security_bprm_creds_for_exec to block executable
> memfd [1], applying this means automatically extending the block to
> the .so object.

That's a good example of how this AT_CHECK check makes sense.

Landlock will probably get a similar (optional) restriction too:
https://github.com/landlock-lsm/linux/issues/37

> 
> I'm not sure if other LSMs need to be updated ?  e.g.  will  SELINUX
> check for .so with its process transaction policy ?

LSM should not need to be updated with this patch series.  However,
systems/components/containers enabling this new check should make sure
it works with their current policy.

> 
> [1] https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/3834992
> 
> -Jeff
> 
> 
> > Usually, that does not include the maim program, but this can
> > happen with explicit loader invocations (“ld.so /bin/true”).
> >
> > Thanks,
> > Florian
> >

Florian Weimer July 8, 2024, 5:33 p.m. UTC | #13

* Jeff Xu:

> On Mon, Jul 8, 2024 at 9:26 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * Jeff Xu:
>>
>> > Will dynamic linkers use the execveat(AT_CHECK) to check shared
>> > libraries too ?  or just the main executable itself.
>>
>> I expect that dynamic linkers will have to do this for everything they
>> map.
> Then all the objects (.so, .sh, etc.) will go through  the check from
> execveat's main  to security_bprm_creds_for_exec(), some of them might
> be specific for the main executable ?

If we want to avoid that, we could have an agreed-upon error code which
the LSM can signal that it'll never fail AT_CHECK checks, so we only
have to perform the extra system call once.

Thanks,
Florian

Jeff Xu July 8, 2024, 5:52 p.m. UTC | #14

On Mon, Jul 8, 2024 at 10:33 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Jeff Xu:
>
> > On Mon, Jul 8, 2024 at 9:26 AM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * Jeff Xu:
> >>
> >> > Will dynamic linkers use the execveat(AT_CHECK) to check shared
> >> > libraries too ?  or just the main executable itself.
> >>
> >> I expect that dynamic linkers will have to do this for everything they
> >> map.
> > Then all the objects (.so, .sh, etc.) will go through  the check from
> > execveat's main  to security_bprm_creds_for_exec(), some of them might
> > be specific for the main executable ?
>
> If we want to avoid that, we could have an agreed-upon error code which
> the LSM can signal that it'll never fail AT_CHECK checks, so we only
> have to perform the extra system call once.
>
Right, something like that.
I would prefer not having AT_CHECK specific code in LSM code as an
initial goal, if that works, great.

-Jeff

> Thanks,
> Florian
>

Kees Cook July 8, 2024, 7:38 p.m. UTC | #15

On Fri, Jul 05, 2024 at 07:53:10PM +0200, Mickaël Salaün wrote:
> On Thu, Jul 04, 2024 at 05:04:03PM -0700, Kees Cook wrote:
> > On Thu, Jul 04, 2024 at 09:01:33PM +0200, Mickaël Salaün wrote:
> > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > allowed for execution.  The main use case is for script interpreters and
> > > dynamic linkers to check execution permission according to the kernel's
> > > security policy. Another use case is to add context to access logs e.g.,
> > > which script (instead of interpreter) accessed a file.  As any
> > > executable code, scripts could also use this check [1].
> > > 
> > > This is different than faccessat(2) which only checks file access
> > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > real execution, user space gets the same error codes.
> > 
> > Nice! I much prefer this method of going through the exec machinery so
> > we always have a single code path for these kinds of checks.
> > 
> > > Because AT_CHECK is dedicated to user space interpreters, it doesn't
> > > make sense for the kernel to parse the checked files, look for
> > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > > if the format is unknown.  Because of that, security_bprm_check() is
> > > never called when AT_CHECK is used.
> > 
> > I'd like some additional comments in the code that reminds us that
> > access control checks have finished past a certain point.
> 
> Where in the code? Just before the bprm->is_check assignment?

Yeah, that's what I was thinking.

Mickaël Salaün July 9, 2024, 9:18 a.m. UTC | #16

On Mon, Jul 08, 2024 at 10:52:36AM -0700, Jeff Xu wrote:
> On Mon, Jul 8, 2024 at 10:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * Jeff Xu:
> >
> > > On Mon, Jul 8, 2024 at 9:26 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >>
> > >> * Jeff Xu:
> > >>
> > >> > Will dynamic linkers use the execveat(AT_CHECK) to check shared
> > >> > libraries too ?  or just the main executable itself.
> > >>
> > >> I expect that dynamic linkers will have to do this for everything they
> > >> map.
> > > Then all the objects (.so, .sh, etc.) will go through  the check from
> > > execveat's main  to security_bprm_creds_for_exec(), some of them might
> > > be specific for the main executable ?

Yes, we should check every executable code (including seccomp filters)
to get a consistent policy.

What do you mean by "specific for the main executable"?

> >
> > If we want to avoid that, we could have an agreed-upon error code which
> > the LSM can signal that it'll never fail AT_CHECK checks, so we only
> > have to perform the extra system call once.

I'm not sure to follow.  Either we check executable code or we don't,
but it doesn't make sense to only check some parts (except for migration
of user space code in a system, which is one purpose of the securebits
added with the next patch).

The idea with AT_CHECK is to unconditionnaly check executable right the
same way it is checked when a file is executed.  User space can decide
to check that or not according to its policy (i.e. securebits).

> >
> Right, something like that.
> I would prefer not having AT_CHECK specific code in LSM code as an
> initial goal, if that works, great.

LSMs should not need to change anything, but they are free to implement
new access right according to AT_CHECK.

> 
> -Jeff
> 
> > Thanks,
> > Florian
> >

Florian Weimer July 9, 2024, 10:05 a.m. UTC | #17

* Mickaël Salaün:

>> > If we want to avoid that, we could have an agreed-upon error code which
>> > the LSM can signal that it'll never fail AT_CHECK checks, so we only
>> > have to perform the extra system call once.
>
> I'm not sure to follow.  Either we check executable code or we don't,
> but it doesn't make sense to only check some parts (except for migration
> of user space code in a system, which is one purpose of the securebits
> added with the next patch).
>
> The idea with AT_CHECK is to unconditionnaly check executable right the
> same way it is checked when a file is executed.  User space can decide
> to check that or not according to its policy (i.e. securebits).

I meant it purely as a performance optimization, to skip future system
calls if we know they won't provide any useful information for this
process.  In the grand scheme of things, the extra system call probably
does not matter because we already have to do costly things like mmap.

Thanks,
Florian

Jeff Xu July 9, 2024, 6:57 p.m. UTC | #18

On Tue, Jul 9, 2024 at 2:18 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Mon, Jul 08, 2024 at 10:52:36AM -0700, Jeff Xu wrote:
> > On Mon, Jul 8, 2024 at 10:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >
> > > * Jeff Xu:
> > >
> > > > On Mon, Jul 8, 2024 at 9:26 AM Florian Weimer <fweimer@redhat.com> wrote:
> > > >>
> > > >> * Jeff Xu:
> > > >>
> > > >> > Will dynamic linkers use the execveat(AT_CHECK) to check shared
> > > >> > libraries too ?  or just the main executable itself.
> > > >>
> > > >> I expect that dynamic linkers will have to do this for everything they
> > > >> map.
> > > > Then all the objects (.so, .sh, etc.) will go through  the check from
> > > > execveat's main  to security_bprm_creds_for_exec(), some of them might
> > > > be specific for the main executable ?
>
> Yes, we should check every executable code (including seccomp filters)
> to get a consistent policy.
>
> What do you mean by "specific for the main executable"?
>
I meant:

The check is for the exe itself, not .so, etc.

For example:  /usr/bin/touch is checked.
not the shared objects:
ldd /usr/bin/touch
linux-vdso.so.1 (0x00007ffdc988f000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f59b6757000)
/lib64/ld-linux-x86-64.so.2 (0x00007f59b6986000)

Basically, I asked if the check can be extended to shared-objects,
seccomp filters, etc, without modifying existing LSMs.
you pointed out "LSM should not need to be updated with this patch
series.", which already answered my question.

Thanks.
-Jeff

-Jeff

Mickaël Salaün July 9, 2024, 8:41 p.m. UTC | #19

On Tue, Jul 09, 2024 at 11:57:27AM -0700, Jeff Xu wrote:
> On Tue, Jul 9, 2024 at 2:18 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Mon, Jul 08, 2024 at 10:52:36AM -0700, Jeff Xu wrote:
> > > On Mon, Jul 8, 2024 at 10:33 AM Florian Weimer <fweimer@redhat.com> wrote:
> > > >
> > > > * Jeff Xu:
> > > >
> > > > > On Mon, Jul 8, 2024 at 9:26 AM Florian Weimer <fweimer@redhat.com> wrote:
> > > > >>
> > > > >> * Jeff Xu:
> > > > >>
> > > > >> > Will dynamic linkers use the execveat(AT_CHECK) to check shared
> > > > >> > libraries too ?  or just the main executable itself.
> > > > >>
> > > > >> I expect that dynamic linkers will have to do this for everything they
> > > > >> map.
> > > > > Then all the objects (.so, .sh, etc.) will go through  the check from
> > > > > execveat's main  to security_bprm_creds_for_exec(), some of them might
> > > > > be specific for the main executable ?
> >
> > Yes, we should check every executable code (including seccomp filters)
> > to get a consistent policy.
> >
> > What do you mean by "specific for the main executable"?
> >
> I meant:
> 
> The check is for the exe itself, not .so, etc.
> 
> For example:  /usr/bin/touch is checked.
> not the shared objects:
> ldd /usr/bin/touch
> linux-vdso.so.1 (0x00007ffdc988f000)
> libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f59b6757000)
> /lib64/ld-linux-x86-64.so.2 (0x00007f59b6986000)

ld.so should be patched to check shared-objects.

> 
> Basically, I asked if the check can be extended to shared-objects,
> seccomp filters, etc, without modifying existing LSMs.

Yes, the check should be used against any piece of code such as
shared-objects, seccomp filters...

> you pointed out "LSM should not need to be updated with this patch
> series.", which already answered my question.
> 
> Thanks.
> -Jeff
> 
> -Jeff

Mickaël Salaün July 9, 2024, 8:42 p.m. UTC | #20

On Tue, Jul 09, 2024 at 12:05:50PM +0200, Florian Weimer wrote:
> * Mickaël Salaün:
> 
> >> > If we want to avoid that, we could have an agreed-upon error code which
> >> > the LSM can signal that it'll never fail AT_CHECK checks, so we only
> >> > have to perform the extra system call once.
> >
> > I'm not sure to follow.  Either we check executable code or we don't,
> > but it doesn't make sense to only check some parts (except for migration
> > of user space code in a system, which is one purpose of the securebits
> > added with the next patch).
> >
> > The idea with AT_CHECK is to unconditionnaly check executable right the
> > same way it is checked when a file is executed.  User space can decide
> > to check that or not according to its policy (i.e. securebits).
> 
> I meant it purely as a performance optimization, to skip future system
> calls if we know they won't provide any useful information for this
> process.  In the grand scheme of things, the extra system call probably
> does not matter because we already have to do costly things like mmap.

Indeed, the performance impact of execveat+AT_CHECK should be negligible
compared to everything else needed to interpret a script or spawn a
process.  Moreover, these checks should only be performed when
SECBIT_SHOULD_EXEC_CHECK is set for the caller.

Jeff Xu July 17, 2024, 6:33 a.m. UTC | #21

On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
>
> Add a new AT_CHECK flag to execveat(2) to check if a file would be
> allowed for execution.  The main use case is for script interpreters and
> dynamic linkers to check execution permission according to the kernel's
> security policy. Another use case is to add context to access logs e.g.,
> which script (instead of interpreter) accessed a file.  As any
> executable code, scripts could also use this check [1].
>
> This is different than faccessat(2) which only checks file access
> rights, but not the full context e.g. mount point's noexec, stack limit,
> and all potential LSM extra checks (e.g. argv, envp, credentials).
> Since the use of AT_CHECK follows the exact kernel semantic as for a
> real execution, user space gets the same error codes.
>
So we concluded that execveat(AT_CHECK) will be used to check the
exec, shared object, script and config file (such as seccomp config),
I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
different use cases:

execveat clearly has less code change, but that also means: we can't
add logic specific to exec (i.e. logic that can't be applied to
config) for this part (from do_execveat_common to
security_bprm_creds_for_exec) in future.  This would require some
agreement/sign-off, I'm not sure from whom.

--------------------------
now looked at user cases (focus on elf for now)

1> ld.so /tmp/a.out, /tmp/a.out is on non-exec mount
dynamic linker will first call execveat(fd, AT_CHECK) then execveat(fd)

2> execve(/usr/bin/some.out) and some.out has dependency on /tmp/a.so
/usr/bin/some.out will pass AT_CHECK

3> execve(usr/bin/some.out) and some.out uses custom /tmp/ld.so
/usr/bin/some.out will pass AT_CHECK, however, it uses a custom
/tmp/ld.so (I assume this is possible  for elf header will set the
path for ld.so because kernel has no knowledge of that, and
binfmt_elf.c allocate memory for ld.so during execveat call)

4> dlopen(/tmp/a.so)
I assume dynamic linker will call execveat(AT_CHECK), before map a.so
into memory.

For case 1>
Alternative solution: Because AT_CHECK is always called, I think we
can avoid the first AT_CHECK call, and check during execveat(fd),
this means the kernel will enforce SECBIT_EXEC_RESTRICT_FILE = 1, the
benefit is that there is no TOCTOU and save one round trip of syscall
for a succesful execveat() case.

For case 2>
dynamic linker will call execve(AT_CHECK), then mmap(fd) into memory.
However,  the process can all open then mmap() directly, it seems
minimal effort for an attacker to walk around such a defence from
dynamic linker.

Alternative solution:
dynamic linker call AT_CHECK for each .so, kernel will save the state
(associated with fd)
kernel will check fd state at the time of mmap(fd, executable memory)
and enforce SECBIT_EXEC_RESTRICT_FILE = 1

Alternative solution 2:
a new syscall to load the .so and enforce the AT_CHECK in kernel

This also means, for the solution to be complete, we might want to
block creation of executable anonymous memory (e.g. by seccomp, ),
unless the user space can harden the creation of  executable anonymous
memory in some way.

For case 3>
I think binfmt_elf.c in the kernel needs to check the ld.so to make
sure it passes AT_CHECK, before loading it into memory.

For case 4>
same as case 2.

Consider those cases: I think:
a> relying purely on userspace for enforcement does't seem to be
effective,  e.g. it is trivial  to call open(), then mmap() it into
executable memory.
b> if both user space and kernel need to call AT_CHECK, the faccessat
seems to be a better place for AT_CHECK, e.g. kernel can call
do_faccessat(AT_CHECK) and userspace can call faccessat(). This will
avoid complicating the execveat() code path.

What do you think ?

Thanks
-Jeff

> With the information that a script interpreter is about to interpret a
> script, an LSM security policy can adjust caller's access rights or log
> execution request as for native script execution (e.g. role transition).
> This is possible thanks to the call to security_bprm_creds_for_exec().
>
> Because LSMs may only change bprm's credentials, use of AT_CHECK with
> current kernel code should not be a security issue (e.g. unexpected role
> transition).  LSMs willing to update the caller's credential could now
> do so when bprm->is_check is set.  Of course, such policy change should
> be in line with the new user space code.
>
> Because AT_CHECK is dedicated to user space interpreters, it doesn't
> make sense for the kernel to parse the checked files, look for
> interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> if the format is unknown.  Because of that, security_bprm_check() is
> never called when AT_CHECK is used.
>
> It should be noted that script interpreters cannot directly use
> execveat(2) (without this new AT_CHECK flag) because this could lead to
> unexpected behaviors e.g., `python script.sh` could lead to Bash being
> executed to interpret the script.  Unlike the kernel, script
> interpreters may just interpret the shebang as a simple comment, which
> should not change for backward compatibility reasons.
>
> Because scripts or libraries files might not currently have the
> executable permission set, or because we might want specific users to be
> allowed to run arbitrary scripts, the following patch provides a dynamic
> configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
> SECBIT_SHOULD_EXEC_RESTRICT securebits.
>
> This is a redesign of the CLIP OS 4's O_MAYEXEC:
> https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> This patch has been used for more than a decade with customized script
> interpreters.  Some examples can be found here:
> https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Paul Moore <paul@paul-moore.com>
> Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> Signed-off-by: Mickaël Salaün <mic@digikod.net>
> Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net
> ---
>
> New design since v18:
> https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> ---
>  fs/exec.c                  |  5 +++--
>  include/linux/binfmts.h    |  7 ++++++-
>  include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++
>  kernel/audit.h             |  1 +
>  kernel/auditsc.c           |  1 +
>  5 files changed, 41 insertions(+), 3 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 40073142288f..ea2a1867afdc 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
>                 .lookup_flags = LOOKUP_FOLLOW,
>         };
>
> -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> +       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
>                 return ERR_PTR(-EINVAL);
>         if (flags & AT_SYMLINK_NOFOLLOW)
>                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
>                 bprm->filename = bprm->fdpath;
>         }
>         bprm->interp = bprm->filename;
> +       bprm->is_check = !!(flags & AT_CHECK);
>
>         retval = bprm_mm_init(bprm);
>         if (!retval)
> @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm)
>
>         /* Set the unchanging part of bprm->cred */
>         retval = security_bprm_creds_for_exec(bprm);
> -       if (retval)
> +       if (retval || bprm->is_check)
>                 goto out;
>
>         retval = exec_binprm(bprm);
> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> index 70f97f685bff..8ff9c9e33aed 100644
> --- a/include/linux/binfmts.h
> +++ b/include/linux/binfmts.h
> @@ -42,7 +42,12 @@ struct linux_binprm {
>                  * Set when errors can no longer be returned to the
>                  * original userspace.
>                  */
> -               point_of_no_return:1;
> +               point_of_no_return:1,
> +               /*
> +                * Set by user space to check executability according to the
> +                * caller's environment.
> +                */
> +               is_check:1;
>         struct file *executable; /* Executable to pass to the interpreter */
>         struct file *interpreter;
>         struct file *file;
> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> index c0bcc185fa48..bcd05c59b7df 100644
> --- a/include/uapi/linux/fcntl.h
> +++ b/include/uapi/linux/fcntl.h
> @@ -118,6 +118,36 @@
>  #define AT_HANDLE_FID          AT_REMOVEDIR    /* file handle is needed to
>                                         compare object identity and may not
>                                         be usable to open_by_handle_at(2) */
> +
> +/*
> + * AT_CHECK only performs a check on a regular file and returns 0 if execution
> + * of this file would be allowed, ignoring the file format and then the related
> + * interpreter dependencies (e.g. ELF libraries, script's shebang).  AT_CHECK
> + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling
> + * thread.  See securebits.h documentation.
> + *
> + * Programs should use this check to apply kernel-level checks against files
> + * that are not directly executed by the kernel but directly passed to a user
> + * space interpreter instead.  All files that contain executable code, from the
> + * point of view of the interpreter, should be checked.  The main purpose of
> + * this flag is to improve the security and consistency of an execution
> + * environment to ensure that direct file execution (e.g. ./script.sh) and
> + * indirect file execution (e.g. sh script.sh) lead to the same result.  For
> + * instance, this can be used to check if a file is trustworthy according to
> + * the caller's environment.
> + *
> + * In a secure environment, libraries and any executable dependencies should
> + * also be checked.  For instance dynamic linking should make sure that all
> + * libraries are allowed for execution to avoid trivial bypass (e.g. using
> + * LD_PRELOAD).  For such secure execution environment to make sense, only
> + * trusted code should be executable, which also requires integrity guarantees.
> + *
> + * To avoid race conditions leading to time-of-check to time-of-use issues,
> + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
> + * descriptor instead of a path.
> + */
> +#define AT_CHECK               0x10000
> +
>  #if defined(__KERNEL__)
>  #define AT_GETATTR_NOSEC       0x80000000
>  #endif
> diff --git a/kernel/audit.h b/kernel/audit.h
> index a60d2840559e..8ebdabd2ab81 100644
> --- a/kernel/audit.h
> +++ b/kernel/audit.h
> @@ -197,6 +197,7 @@ struct audit_context {
>                 struct open_how openat2;
>                 struct {
>                         int                     argc;
> +                       bool                    is_check;
>                 } execve;
>                 struct {
>                         char                    *name;
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 6f0d6fb6523f..b6316e284342 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
>
>         context->type = AUDIT_EXECVE;
>         context->execve.argc = bprm->argc;
> +       context->execve.is_check = bprm->is_check;
>  }
>
>
> --
> 2.45.2
>

Steve Dower July 17, 2024, 8:26 a.m. UTC | #22

On 17/07/2024 07:33, Jeff Xu wrote:
> Consider those cases: I think:
> a> relying purely on userspace for enforcement does't seem to be
> effective,  e.g. it is trivial  to call open(), then mmap() it into
> executable memory.

If there's a way to do this without running executable code that had to 
pass a previous execveat() check, then yeah, it's not effective (e.g. a 
Python interpreter that *doesn't* enforce execveat() is a trivial way to 
do it).

Once arbitrary code is running, all bets are off. So long as all 
arbitrary code is being checked itself, it's allowed to do things that 
would bypass later checks (and it's up to whoever audited it in the 
first place to prevent this by not giving it the special mark that 
allows it to pass the check).

> b> if both user space and kernel need to call AT_CHECK, the faccessat
> seems to be a better place for AT_CHECK, e.g. kernel can call
> do_faccessat(AT_CHECK) and userspace can call faccessat(). This will
> avoid complicating the execveat() code path.
> 
> What do you think ?
> 
> Thanks
> -Jeff

Mickaël Salaün July 17, 2024, 10 a.m. UTC | #23

On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote:
> On 17/07/2024 07:33, Jeff Xu wrote:
> > Consider those cases: I think:
> > a> relying purely on userspace for enforcement does't seem to be
> > effective,  e.g. it is trivial  to call open(), then mmap() it into
> > executable memory.
> 
> If there's a way to do this without running executable code that had to pass
> a previous execveat() check, then yeah, it's not effective (e.g. a Python
> interpreter that *doesn't* enforce execveat() is a trivial way to do it).
> 
> Once arbitrary code is running, all bets are off. So long as all arbitrary
> code is being checked itself, it's allowed to do things that would bypass
> later checks (and it's up to whoever audited it in the first place to
> prevent this by not giving it the special mark that allows it to pass the
> check).

Exactly.  As explained in the patches, one crucial prerequisite is that
the executable code is trusted, and the system must provide integrity
guarantees.  We cannot do anything without that.  This patches series is
a building block to fix a blind spot on Linux systems to be able to
fully control executability.

Mickaël Salaün July 17, 2024, 10:01 a.m. UTC | #24

On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > allowed for execution.  The main use case is for script interpreters and
> > dynamic linkers to check execution permission according to the kernel's
> > security policy. Another use case is to add context to access logs e.g.,
> > which script (instead of interpreter) accessed a file.  As any
> > executable code, scripts could also use this check [1].
> >
> > This is different than faccessat(2) which only checks file access
> > rights, but not the full context e.g. mount point's noexec, stack limit,
> > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > real execution, user space gets the same error codes.
> >
> So we concluded that execveat(AT_CHECK) will be used to check the
> exec, shared object, script and config file (such as seccomp config),

"config file" that contains executable code.

> I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
> different use cases:
> 
> execveat clearly has less code change, but that also means: we can't
> add logic specific to exec (i.e. logic that can't be applied to
> config) for this part (from do_execveat_common to
> security_bprm_creds_for_exec) in future.  This would require some
> agreement/sign-off, I'm not sure from whom.

I'm not sure to follow. We could still add new flags, but for now I
don't see use cases.  This patch series is not meant to handle all
possible "trust checks", only executable code, which makes sense for the
kernel.

If we want other checks, we'll need to clearly define their semantic and
align with the kernel.  faccessat2(2) might be used to check other file
properties, but the executable property is not only defined by the file
attributes.

> 
> --------------------------
> now looked at user cases (focus on elf for now)
> 
> 1> ld.so /tmp/a.out, /tmp/a.out is on non-exec mount
> dynamic linker will first call execveat(fd, AT_CHECK) then execveat(fd)
> 
> 2> execve(/usr/bin/some.out) and some.out has dependency on /tmp/a.so
> /usr/bin/some.out will pass AT_CHECK
> 
> 3> execve(usr/bin/some.out) and some.out uses custom /tmp/ld.so
> /usr/bin/some.out will pass AT_CHECK, however, it uses a custom
> /tmp/ld.so (I assume this is possible  for elf header will set the
> path for ld.so because kernel has no knowledge of that, and
> binfmt_elf.c allocate memory for ld.so during execveat call)
> 
> 4> dlopen(/tmp/a.so)
> I assume dynamic linker will call execveat(AT_CHECK), before map a.so
> into memory.
> 
> For case 1>
> Alternative solution: Because AT_CHECK is always called, I think we
> can avoid the first AT_CHECK call, and check during execveat(fd),

There is no need to use AT_CHECK if we're going to call execveat(2) on
the same file descriptor.  By design, AT_CHECK is implicit for any
execve(2).

> this means the kernel will enforce SECBIT_EXEC_RESTRICT_FILE = 1, the
> benefit is that there is no TOCTOU and save one round trip of syscall
> for a succesful execveat() case.

As long as user space uses the same file descriptor, there is no TOCTOU.

SECBIT_EXEC_RESTRICT_FILE only makes sense for user space: it defines
the user space security policy.  The kernel already enforces the same
security policy for any execve(2), whatever are the calling process's
securebits.

> 
> For case 2>
> dynamic linker will call execve(AT_CHECK), then mmap(fd) into memory.
> However,  the process can all open then mmap() directly, it seems
> minimal effort for an attacker to walk around such a defence from
> dynamic linker.

Which process?  What do you mean by "can all open then mmap() directly"?

In this context the dynamic linker (like its parent processes) is
trusted (guaranteed by the system).

For case 2, the dynamic linker must check with AT_CHECK all files that
will be mapped, which include /usr/bin/some.out and /tmp/a.so

> 
> Alternative solution:
> dynamic linker call AT_CHECK for each .so, kernel will save the state
> (associated with fd)
> kernel will check fd state at the time of mmap(fd, executable memory)
> and enforce SECBIT_EXEC_RESTRICT_FILE = 1

The idea with AT_CHECK is that there is no kernel side effect, no extra
kernel state, and the semantic is the same as with execve(2).

This also enables us to check file's executable permission and ignore
it, which is useful in a "permissive mode" when preparing for a
migration without breaking a system, or to do extra integrity checks.
BTW, this use case would also be more complex with a new openat2(2) flag
like the original O_MAYEXEC.

> 
> Alternative solution 2:
> a new syscall to load the .so and enforce the AT_CHECK in kernel

A new syscall would be overkill for this feature.  Please see Linus's
comment.

> 
> This also means, for the solution to be complete, we might want to
> block creation of executable anonymous memory (e.g. by seccomp, ),

How seccomp could create anonymous memory in user space?
seccomp filters should be treated (and checked with AT_CHECK) as
executable code anyway.

> unless the user space can harden the creation of  executable anonymous
> memory in some way.

User space is already in charge of mmapping its own memory.  I don't see
what is missing.

> 
> For case 3>
> I think binfmt_elf.c in the kernel needs to check the ld.so to make
> sure it passes AT_CHECK, before loading it into memory.

All ELF dependencies are opened and checked with open_exec(), which
perform the main executability checks (with the __FMODE_EXEC flag).
Did I miss something?

However, we must be careful with programs using the (deprecated)
uselib(2). They should also check with AT_CHECK because this syscall
opens the shared library without __FMODE_EXEC (similar to a simple file
open). See
https://lore.kernel.org/all/CAHk-=wiUwRG7LuR=z5sbkFVGQh+7qVB6_1NM0Ny9SVNL1Un4Sw@mail.gmail.com/

> 
> For case 4>
> same as case 2.
> 
> Consider those cases: I think:
> a> relying purely on userspace for enforcement does't seem to be
> effective,  e.g. it is trivial  to call open(), then mmap() it into
> executable memory.

As Steve explained (and is also explained in the patches), it is trivial
if the attacker can already execute its own code, which is too late to
enforce any script execution control.

> b> if both user space and kernel need to call AT_CHECK, the faccessat
> seems to be a better place for AT_CHECK, e.g. kernel can call
> do_faccessat(AT_CHECK) and userspace can call faccessat(). This will
> avoid complicating the execveat() code path.

A previous version of this patches series already patched faccessat(2),
but this is not the right place.  faccessat2(2) is dedicated to check
file permissions, not executability (e.g. with mount's noexec).

> 
> What do you think ?

I think there are some misunderstandings.  Please let me know if it's
clearer now.

> 
> Thanks
> -Jeff
> 
> > With the information that a script interpreter is about to interpret a
> > script, an LSM security policy can adjust caller's access rights or log
> > execution request as for native script execution (e.g. role transition).
> > This is possible thanks to the call to security_bprm_creds_for_exec().
> >
> > Because LSMs may only change bprm's credentials, use of AT_CHECK with
> > current kernel code should not be a security issue (e.g. unexpected role
> > transition).  LSMs willing to update the caller's credential could now
> > do so when bprm->is_check is set.  Of course, such policy change should
> > be in line with the new user space code.
> >
> > Because AT_CHECK is dedicated to user space interpreters, it doesn't
> > make sense for the kernel to parse the checked files, look for
> > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > if the format is unknown.  Because of that, security_bprm_check() is
> > never called when AT_CHECK is used.
> >
> > It should be noted that script interpreters cannot directly use
> > execveat(2) (without this new AT_CHECK flag) because this could lead to
> > unexpected behaviors e.g., `python script.sh` could lead to Bash being
> > executed to interpret the script.  Unlike the kernel, script
> > interpreters may just interpret the shebang as a simple comment, which
> > should not change for backward compatibility reasons.
> >
> > Because scripts or libraries files might not currently have the
> > executable permission set, or because we might want specific users to be
> > allowed to run arbitrary scripts, the following patch provides a dynamic
> > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
> > SECBIT_SHOULD_EXEC_RESTRICT securebits.
> >
> > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > This patch has been used for more than a decade with customized script
> > interpreters.  Some examples can be found here:
> > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> >
> > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > Cc: Christian Brauner <brauner@kernel.org>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Paul Moore <paul@paul-moore.com>
> > Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net
> > ---
> >
> > New design since v18:
> > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > ---
> >  fs/exec.c                  |  5 +++--
> >  include/linux/binfmts.h    |  7 ++++++-
> >  include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++
> >  kernel/audit.h             |  1 +
> >  kernel/auditsc.c           |  1 +
> >  5 files changed, 41 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 40073142288f..ea2a1867afdc 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> >                 .lookup_flags = LOOKUP_FOLLOW,
> >         };
> >
> > -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > +       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
> >                 return ERR_PTR(-EINVAL);
> >         if (flags & AT_SYMLINK_NOFOLLOW)
> >                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> > @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
> >                 bprm->filename = bprm->fdpath;
> >         }
> >         bprm->interp = bprm->filename;
> > +       bprm->is_check = !!(flags & AT_CHECK);
> >
> >         retval = bprm_mm_init(bprm);
> >         if (!retval)
> > @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm)
> >
> >         /* Set the unchanging part of bprm->cred */
> >         retval = security_bprm_creds_for_exec(bprm);
> > -       if (retval)
> > +       if (retval || bprm->is_check)
> >                 goto out;
> >
> >         retval = exec_binprm(bprm);
> > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > index 70f97f685bff..8ff9c9e33aed 100644
> > --- a/include/linux/binfmts.h
> > +++ b/include/linux/binfmts.h
> > @@ -42,7 +42,12 @@ struct linux_binprm {
> >                  * Set when errors can no longer be returned to the
> >                  * original userspace.
> >                  */
> > -               point_of_no_return:1;
> > +               point_of_no_return:1,
> > +               /*
> > +                * Set by user space to check executability according to the
> > +                * caller's environment.
> > +                */
> > +               is_check:1;
> >         struct file *executable; /* Executable to pass to the interpreter */
> >         struct file *interpreter;
> >         struct file *file;
> > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > index c0bcc185fa48..bcd05c59b7df 100644
> > --- a/include/uapi/linux/fcntl.h
> > +++ b/include/uapi/linux/fcntl.h
> > @@ -118,6 +118,36 @@
> >  #define AT_HANDLE_FID          AT_REMOVEDIR    /* file handle is needed to
> >                                         compare object identity and may not
> >                                         be usable to open_by_handle_at(2) */
> > +
> > +/*
> > + * AT_CHECK only performs a check on a regular file and returns 0 if execution
> > + * of this file would be allowed, ignoring the file format and then the related
> > + * interpreter dependencies (e.g. ELF libraries, script's shebang).  AT_CHECK
> > + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling
> > + * thread.  See securebits.h documentation.
> > + *
> > + * Programs should use this check to apply kernel-level checks against files
> > + * that are not directly executed by the kernel but directly passed to a user
> > + * space interpreter instead.  All files that contain executable code, from the
> > + * point of view of the interpreter, should be checked.  The main purpose of
> > + * this flag is to improve the security and consistency of an execution
> > + * environment to ensure that direct file execution (e.g. ./script.sh) and
> > + * indirect file execution (e.g. sh script.sh) lead to the same result.  For
> > + * instance, this can be used to check if a file is trustworthy according to
> > + * the caller's environment.
> > + *
> > + * In a secure environment, libraries and any executable dependencies should
> > + * also be checked.  For instance dynamic linking should make sure that all
> > + * libraries are allowed for execution to avoid trivial bypass (e.g. using
> > + * LD_PRELOAD).  For such secure execution environment to make sense, only
> > + * trusted code should be executable, which also requires integrity guarantees.
> > + *
> > + * To avoid race conditions leading to time-of-check to time-of-use issues,
> > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
> > + * descriptor instead of a path.
> > + */
> > +#define AT_CHECK               0x10000
> > +
> >  #if defined(__KERNEL__)
> >  #define AT_GETATTR_NOSEC       0x80000000
> >  #endif
> > diff --git a/kernel/audit.h b/kernel/audit.h
> > index a60d2840559e..8ebdabd2ab81 100644
> > --- a/kernel/audit.h
> > +++ b/kernel/audit.h
> > @@ -197,6 +197,7 @@ struct audit_context {
> >                 struct open_how openat2;
> >                 struct {
> >                         int                     argc;
> > +                       bool                    is_check;
> >                 } execve;
> >                 struct {
> >                         char                    *name;
> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > index 6f0d6fb6523f..b6316e284342 100644
> > --- a/kernel/auditsc.c
> > +++ b/kernel/auditsc.c
> > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
> >
> >         context->type = AUDIT_EXECVE;
> >         context->execve.argc = bprm->argc;
> > +       context->execve.is_check = bprm->is_check;
> >  }
> >
> >
> > --
> > 2.45.2
> >
>

Andy Lutomirski July 18, 2024, 1:02 a.m. UTC | #25

> On Jul 17, 2024, at 6:01 PM, Mickaël Salaün <mic@digikod.net> wrote:
>
> On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote:
>>> On 17/07/2024 07:33, Jeff Xu wrote:
>>> Consider those cases: I think:
>>> a> relying purely on userspace for enforcement does't seem to be
>>> effective,  e.g. it is trivial  to call open(), then mmap() it into
>>> executable memory.
>>
>> If there's a way to do this without running executable code that had to pass
>> a previous execveat() check, then yeah, it's not effective (e.g. a Python
>> interpreter that *doesn't* enforce execveat() is a trivial way to do it).
>>
>> Once arbitrary code is running, all bets are off. So long as all arbitrary
>> code is being checked itself, it's allowed to do things that would bypass
>> later checks (and it's up to whoever audited it in the first place to
>> prevent this by not giving it the special mark that allows it to pass the
>> check).
>
> Exactly.  As explained in the patches, one crucial prerequisite is that
> the executable code is trusted, and the system must provide integrity
> guarantees.  We cannot do anything without that.  This patches series is
> a building block to fix a blind spot on Linux systems to be able to
> fully control executability.

Circling back to my previous comment (did that ever get noticed?), I
don’t think this is quite right:

https://lore.kernel.org/all/CALCETrWYu=PYJSgyJ-vaa+3BGAry8Jo8xErZLiGR3U5h6+U0tA@mail.gmail.com/

On a basic system configuration, a given path either may or may not be
executed. And maybe that path has some integrity check (dm-verity,
etc).  So the kernel should tell the interpreter/loader whether the
target may be executed. All fine.

 But I think the more complex cases are more interesting, and the
“execute a program” process IS NOT BINARY.  An attempt to execute can
be rejected outright, or it can be allowed *with a change to creds or
security context*.  It would be entirely reasonable to have a policy
that allows execution of non-integrity-checked files but in a very
locked down context only.

So… shouldn’t a patch series to this effect actually support this?

Jeff Xu July 18, 2024, 1:51 a.m. UTC | #26

On Wed, Jul 17, 2024 at 3:00 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote:
> > On 17/07/2024 07:33, Jeff Xu wrote:
> > > Consider those cases: I think:
> > > a> relying purely on userspace for enforcement does't seem to be
> > > effective,  e.g. it is trivial  to call open(), then mmap() it into
> > > executable memory.
> >
> > If there's a way to do this without running executable code that had to pass
> > a previous execveat() check, then yeah, it's not effective (e.g. a Python
> > interpreter that *doesn't* enforce execveat() is a trivial way to do it).
> >
> > Once arbitrary code is running, all bets are off. So long as all arbitrary
> > code is being checked itself, it's allowed to do things that would bypass
> > later checks (and it's up to whoever audited it in the first place to
> > prevent this by not giving it the special mark that allows it to pass the
> > check).
>
We will want to define what is considered as "arbitrary code is running"

Using an example of ROP, attackers change the return address in stack,
e.g. direct the execution flow to a gauge to call "ld.so /tmp/a.out",
do you consider "arbitrary code is running" when stack is overwritten
? or after execve() is called.
If it is later, this patch can prevent "ld.so /tmp/a.out".

> Exactly.  As explained in the patches, one crucial prerequisite is that
> the executable code is trusted, and the system must provide integrity
> guarantees.  We cannot do anything without that.  This patches series is
> a building block to fix a blind spot on Linux systems to be able to
> fully control executability.

Even trusted executable can have a bug.

I'm thinking in the context of ChromeOS, where all its system services
are from trusted partitions, and legit code won't load .so from a
non-exec mount.  But we want to sandbox those services, so even under
some kind of ROP attack, the service still won't be able to load .so
from /tmp. Of course, if an attacker can already write arbitrary
length of data into the stack, it is probably already a game over.

Jeff Xu July 18, 2024, 2:08 a.m. UTC | #27

On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > allowed for execution.  The main use case is for script interpreters and
> > > dynamic linkers to check execution permission according to the kernel's
> > > security policy. Another use case is to add context to access logs e.g.,
> > > which script (instead of interpreter) accessed a file.  As any
> > > executable code, scripts could also use this check [1].
> > >
> > > This is different than faccessat(2) which only checks file access
> > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > real execution, user space gets the same error codes.
> > >
> > So we concluded that execveat(AT_CHECK) will be used to check the
> > exec, shared object, script and config file (such as seccomp config),
>
> "config file" that contains executable code.
>
Is seccomp config  considered as "contains executable code", seccomp
config is translated into bpf, so maybe yes ? but bpf is running in
the kernel.

> > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
> > different use cases:
> >
> > execveat clearly has less code change, but that also means: we can't
> > add logic specific to exec (i.e. logic that can't be applied to
> > config) for this part (from do_execveat_common to
> > security_bprm_creds_for_exec) in future.  This would require some
> > agreement/sign-off, I'm not sure from whom.
>
> I'm not sure to follow. We could still add new flags, but for now I
> don't see use cases.  This patch series is not meant to handle all
> possible "trust checks", only executable code, which makes sense for the
> kernel.
>
I guess the "configfile" discussion is where I get confused, at one
point, I think this would become a generic "trust checks" api for
everything related to "generating executable code", e.g. javascript,
java code, and more.
We will want to clearly define the scope of execveat(AT_CHECK)

> If we want other checks, we'll need to clearly define their semantic and
> align with the kernel.  faccessat2(2) might be used to check other file
> properties, but the executable property is not only defined by the file
> attributes.
>
Agreed.

> >
> > --------------------------
> > now looked at user cases (focus on elf for now)
> >
> > 1> ld.so /tmp/a.out, /tmp/a.out is on non-exec mount
> > dynamic linker will first call execveat(fd, AT_CHECK) then execveat(fd)
> >
> > 2> execve(/usr/bin/some.out) and some.out has dependency on /tmp/a.so
> > /usr/bin/some.out will pass AT_CHECK
> >
> > 3> execve(usr/bin/some.out) and some.out uses custom /tmp/ld.so
> > /usr/bin/some.out will pass AT_CHECK, however, it uses a custom
> > /tmp/ld.so (I assume this is possible  for elf header will set the
> > path for ld.so because kernel has no knowledge of that, and
> > binfmt_elf.c allocate memory for ld.so during execveat call)
> >
> > 4> dlopen(/tmp/a.so)
> > I assume dynamic linker will call execveat(AT_CHECK), before map a.so
> > into memory.
> >
> > For case 1>
> > Alternative solution: Because AT_CHECK is always called, I think we
> > can avoid the first AT_CHECK call, and check during execveat(fd),
>
> There is no need to use AT_CHECK if we're going to call execveat(2) on
> the same file descriptor.  By design, AT_CHECK is implicit for any
> execve(2).
>
Yes. I realized I was wrong to say that ld.so will call execve() for
/tmp/a.out, there is no execve() call, otherwise it would have been
blocked already today.
The ld.so will  mmap the /tmp/a.out directly.  So case 1 is no
different than case 2 and 4.  ( the elf objects are mapped to memory
by dynamic linker.)
I'm not familiar with dynamic linker, Florian is on this thread, and
can help to correct me if my guess is wrong.

> > this means the kernel will enforce SECBIT_EXEC_RESTRICT_FILE = 1, the
> > benefit is that there is no TOCTOU and save one round trip of syscall
> > for a succesful execveat() case.
>
> As long as user space uses the same file descriptor, there is no TOCTOU.
>
> SECBIT_EXEC_RESTRICT_FILE only makes sense for user space: it defines
> the user space security policy.  The kernel already enforces the same
> security policy for any execve(2), whatever are the calling process's
> securebits.
>
> >
> > For case 2>
> > dynamic linker will call execve(AT_CHECK), then mmap(fd) into memory.
> > However,  the process can all open then mmap() directly, it seems
> > minimal effort for an attacker to walk around such a defence from
> > dynamic linker.
>
> Which process?  What do you mean by "can all open then mmap() directly"?
>
> In this context the dynamic linker (like its parent processes) is
> trusted (guaranteed by the system).
>
> For case 2, the dynamic linker must check with AT_CHECK all files that
> will be mapped, which include /usr/bin/some.out and /tmp/a.so
>
My point is that the process can work around this by mmap() the file directly.

> >
> > Alternative solution:
> > dynamic linker call AT_CHECK for each .so, kernel will save the state
> > (associated with fd)
> > kernel will check fd state at the time of mmap(fd, executable memory)
> > and enforce SECBIT_EXEC_RESTRICT_FILE = 1
>
> The idea with AT_CHECK is that there is no kernel side effect, no extra
> kernel state, and the semantic is the same as with execve(2).
>
> This also enables us to check file's executable permission and ignore
> it, which is useful in a "permissive mode" when preparing for a
> migration without breaking a system, or to do extra integrity checks.
For preparing a migration (detect all violations), this is useful.
But as a defense mechanism (SECBIT_EXEC_RESTRICT_FILE = 1) , this
seems to be weak, at least for elf loading case.

> BTW, this use case would also be more complex with a new openat2(2) flag
> like the original O_MAYEXEC.
>
> >
> > Alternative solution 2:
> > a new syscall to load the .so and enforce the AT_CHECK in kernel
>
> A new syscall would be overkill for this feature.  Please see Linus's
> comment.
>
maybe, I was thinking on how to prevent "/tmp/a.o" from getting mmap()
to executable memory.

> >
> > This also means, for the solution to be complete, we might want to
> > block creation of executable anonymous memory (e.g. by seccomp, ),
>
> How seccomp could create anonymous memory in user space?
> seccomp filters should be treated (and checked with AT_CHECK) as
> executable code anyway.
>
> > unless the user space can harden the creation of  executable anonymous
> > memory in some way.
>
> User space is already in charge of mmapping its own memory.  I don't see
> what is missing.
>
> >
> > For case 3>
> > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > sure it passes AT_CHECK, before loading it into memory.
>
> All ELF dependencies are opened and checked with open_exec(), which
> perform the main executability checks (with the __FMODE_EXEC flag).
> Did I miss something?
>
I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
The app can choose its own dynamic linker path during build, (maybe
even statically link one ?)  This is another reason that relying on a
userspace only is not enough.

> However, we must be careful with programs using the (deprecated)
> uselib(2). They should also check with AT_CHECK because this syscall
> opens the shared library without __FMODE_EXEC (similar to a simple file
> open). See
> https://lore.kernel.org/all/CAHk-=wiUwRG7LuR=z5sbkFVGQh+7qVB6_1NM0Ny9SVNL1Un4Sw@mail.gmail.com/
>
> >
> > For case 4>
> > same as case 2.
> >
> > Consider those cases: I think:
> > a> relying purely on userspace for enforcement does't seem to be
> > effective,  e.g. it is trivial  to call open(), then mmap() it into
> > executable memory.
>
> As Steve explained (and is also explained in the patches), it is trivial
> if the attacker can already execute its own code, which is too late to
> enforce any script execution control.
>
> > b> if both user space and kernel need to call AT_CHECK, the faccessat
> > seems to be a better place for AT_CHECK, e.g. kernel can call
> > do_faccessat(AT_CHECK) and userspace can call faccessat(). This will
> > avoid complicating the execveat() code path.
>
> A previous version of this patches series already patched faccessat(2),
> but this is not the right place.  faccessat2(2) is dedicated to check
> file permissions, not executability (e.g. with mount's noexec).
>
> >
> > What do you think ?
>
> I think there are some misunderstandings.  Please let me know if it's
> clearer now.
>
I'm still not sure about the user case for dynamic linker (elf
loading) case. Maybe this patch is more suitable for scripts?
A detailed user case will help demonstrate the use case for dynamic
linker, e.g. what kind of app will benefit from
SECBIT_EXEC_RESTRICT_FILE = 1, what kind of threat model are we
dealing with , what kind of attack chain we blocked as a result.

> >
> > Thanks
> > -Jeff
> >
> > > With the information that a script interpreter is about to interpret a
> > > script, an LSM security policy can adjust caller's access rights or log
> > > execution request as for native script execution (e.g. role transition).
> > > This is possible thanks to the call to security_bprm_creds_for_exec().
> > >
> > > Because LSMs may only change bprm's credentials, use of AT_CHECK with
> > > current kernel code should not be a security issue (e.g. unexpected role
> > > transition).  LSMs willing to update the caller's credential could now
> > > do so when bprm->is_check is set.  Of course, such policy change should
> > > be in line with the new user space code.
> > >
> > > Because AT_CHECK is dedicated to user space interpreters, it doesn't
> > > make sense for the kernel to parse the checked files, look for
> > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > > if the format is unknown.  Because of that, security_bprm_check() is
> > > never called when AT_CHECK is used.
> > >
> > > It should be noted that script interpreters cannot directly use
> > > execveat(2) (without this new AT_CHECK flag) because this could lead to
> > > unexpected behaviors e.g., `python script.sh` could lead to Bash being
> > > executed to interpret the script.  Unlike the kernel, script
> > > interpreters may just interpret the shebang as a simple comment, which
> > > should not change for backward compatibility reasons.
> > >
> > > Because scripts or libraries files might not currently have the
> > > executable permission set, or because we might want specific users to be
> > > allowed to run arbitrary scripts, the following patch provides a dynamic
> > > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
> > > SECBIT_SHOULD_EXEC_RESTRICT securebits.
> > >
> > > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > > This patch has been used for more than a decade with customized script
> > > interpreters.  Some examples can be found here:
> > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> > >
> > > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > Cc: Kees Cook <keescook@chromium.org>
> > > Cc: Paul Moore <paul@paul-moore.com>
> > > Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net
> > > ---
> > >
> > > New design since v18:
> > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > > ---
> > >  fs/exec.c                  |  5 +++--
> > >  include/linux/binfmts.h    |  7 ++++++-
> > >  include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++
> > >  kernel/audit.h             |  1 +
> > >  kernel/auditsc.c           |  1 +
> > >  5 files changed, 41 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/fs/exec.c b/fs/exec.c
> > > index 40073142288f..ea2a1867afdc 100644
> > > --- a/fs/exec.c
> > > +++ b/fs/exec.c
> > > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> > >                 .lookup_flags = LOOKUP_FOLLOW,
> > >         };
> > >
> > > -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > +       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
> > >                 return ERR_PTR(-EINVAL);
> > >         if (flags & AT_SYMLINK_NOFOLLOW)
> > >                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> > > @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
> > >                 bprm->filename = bprm->fdpath;
> > >         }
> > >         bprm->interp = bprm->filename;
> > > +       bprm->is_check = !!(flags & AT_CHECK);
> > >
> > >         retval = bprm_mm_init(bprm);
> > >         if (!retval)
> > > @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm)
> > >
> > >         /* Set the unchanging part of bprm->cred */
> > >         retval = security_bprm_creds_for_exec(bprm);
> > > -       if (retval)
> > > +       if (retval || bprm->is_check)
> > >                 goto out;
> > >
> > >         retval = exec_binprm(bprm);
> > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > > index 70f97f685bff..8ff9c9e33aed 100644
> > > --- a/include/linux/binfmts.h
> > > +++ b/include/linux/binfmts.h
> > > @@ -42,7 +42,12 @@ struct linux_binprm {
> > >                  * Set when errors can no longer be returned to the
> > >                  * original userspace.
> > >                  */
> > > -               point_of_no_return:1;
> > > +               point_of_no_return:1,
> > > +               /*
> > > +                * Set by user space to check executability according to the
> > > +                * caller's environment.
> > > +                */
> > > +               is_check:1;
> > >         struct file *executable; /* Executable to pass to the interpreter */
> > >         struct file *interpreter;
> > >         struct file *file;
> > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > > index c0bcc185fa48..bcd05c59b7df 100644
> > > --- a/include/uapi/linux/fcntl.h
> > > +++ b/include/uapi/linux/fcntl.h
> > > @@ -118,6 +118,36 @@
> > >  #define AT_HANDLE_FID          AT_REMOVEDIR    /* file handle is needed to
> > >                                         compare object identity and may not
> > >                                         be usable to open_by_handle_at(2) */
> > > +
> > > +/*
> > > + * AT_CHECK only performs a check on a regular file and returns 0 if execution
> > > + * of this file would be allowed, ignoring the file format and then the related
> > > + * interpreter dependencies (e.g. ELF libraries, script's shebang).  AT_CHECK
> > > + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling
> > > + * thread.  See securebits.h documentation.
> > > + *
> > > + * Programs should use this check to apply kernel-level checks against files
> > > + * that are not directly executed by the kernel but directly passed to a user
> > > + * space interpreter instead.  All files that contain executable code, from the
> > > + * point of view of the interpreter, should be checked.  The main purpose of
> > > + * this flag is to improve the security and consistency of an execution
> > > + * environment to ensure that direct file execution (e.g. ./script.sh) and
> > > + * indirect file execution (e.g. sh script.sh) lead to the same result.  For
> > > + * instance, this can be used to check if a file is trustworthy according to
> > > + * the caller's environment.
> > > + *
> > > + * In a secure environment, libraries and any executable dependencies should
> > > + * also be checked.  For instance dynamic linking should make sure that all
> > > + * libraries are allowed for execution to avoid trivial bypass (e.g. using
> > > + * LD_PRELOAD).  For such secure execution environment to make sense, only
> > > + * trusted code should be executable, which also requires integrity guarantees.
> > > + *
> > > + * To avoid race conditions leading to time-of-check to time-of-use issues,
> > > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
> > > + * descriptor instead of a path.
> > > + */
> > > +#define AT_CHECK               0x10000
> > > +
> > >  #if defined(__KERNEL__)
> > >  #define AT_GETATTR_NOSEC       0x80000000
> > >  #endif
> > > diff --git a/kernel/audit.h b/kernel/audit.h
> > > index a60d2840559e..8ebdabd2ab81 100644
> > > --- a/kernel/audit.h
> > > +++ b/kernel/audit.h
> > > @@ -197,6 +197,7 @@ struct audit_context {
> > >                 struct open_how openat2;
> > >                 struct {
> > >                         int                     argc;
> > > +                       bool                    is_check;
> > >                 } execve;
> > >                 struct {
> > >                         char                    *name;
> > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > index 6f0d6fb6523f..b6316e284342 100644
> > > --- a/kernel/auditsc.c
> > > +++ b/kernel/auditsc.c
> > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
> > >
> > >         context->type = AUDIT_EXECVE;
> > >         context->execve.argc = bprm->argc;
> > > +       context->execve.is_check = bprm->is_check;
> > >  }
> > >
> > >
> > > --
> > > 2.45.2
> > >
> >

Mickaël Salaün July 18, 2024, 12:22 p.m. UTC | #28

On Thu, Jul 18, 2024 at 09:02:56AM +0800, Andy Lutomirski wrote:
> > On Jul 17, 2024, at 6:01 PM, Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote:
> >>> On 17/07/2024 07:33, Jeff Xu wrote:
> >>> Consider those cases: I think:
> >>> a> relying purely on userspace for enforcement does't seem to be
> >>> effective,  e.g. it is trivial  to call open(), then mmap() it into
> >>> executable memory.
> >>
> >> If there's a way to do this without running executable code that had to pass
> >> a previous execveat() check, then yeah, it's not effective (e.g. a Python
> >> interpreter that *doesn't* enforce execveat() is a trivial way to do it).
> >>
> >> Once arbitrary code is running, all bets are off. So long as all arbitrary
> >> code is being checked itself, it's allowed to do things that would bypass
> >> later checks (and it's up to whoever audited it in the first place to
> >> prevent this by not giving it the special mark that allows it to pass the
> >> check).
> >
> > Exactly.  As explained in the patches, one crucial prerequisite is that
> > the executable code is trusted, and the system must provide integrity
> > guarantees.  We cannot do anything without that.  This patches series is
> > a building block to fix a blind spot on Linux systems to be able to
> > fully control executability.
> 
> Circling back to my previous comment (did that ever get noticed?), I

Yes, I replied to your comments.  Did I miss something?

> don’t think this is quite right:
> 
> https://lore.kernel.org/all/CALCETrWYu=PYJSgyJ-vaa+3BGAry8Jo8xErZLiGR3U5h6+U0tA@mail.gmail.com/
> 
> On a basic system configuration, a given path either may or may not be
> executed. And maybe that path has some integrity check (dm-verity,
> etc).  So the kernel should tell the interpreter/loader whether the
> target may be executed. All fine.
> 
>  But I think the more complex cases are more interesting, and the
> “execute a program” process IS NOT BINARY.  An attempt to execute can
> be rejected outright, or it can be allowed *with a change to creds or
> security context*.  It would be entirely reasonable to have a policy
> that allows execution of non-integrity-checked files but in a very
> locked down context only.

I guess you mean to transition to a sandbox when executing an untrusted
file.  This is a good idea.  I talked about role transition in the
patch's description:

With the information that a script interpreter is about to interpret a
script, an LSM security policy can adjust caller's access rights or log
execution request as for native script execution (e.g. role transition).
This is possible thanks to the call to security_bprm_creds_for_exec().

> 
> So… shouldn’t a patch series to this effect actually support this?
> 

This patch series brings the minimal building blocks to have a
consistent execution environment.  Role transitions for script execution
are left to LSMs.  For instance, we could extend Landlock to
automatically sandbox untrusted scripts.

Mickaël Salaün July 18, 2024, 12:23 p.m. UTC | #29

On Wed, Jul 17, 2024 at 06:51:11PM -0700, Jeff Xu wrote:
> On Wed, Jul 17, 2024 at 3:00 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote:
> > > On 17/07/2024 07:33, Jeff Xu wrote:
> > > > Consider those cases: I think:
> > > > a> relying purely on userspace for enforcement does't seem to be
> > > > effective,  e.g. it is trivial  to call open(), then mmap() it into
> > > > executable memory.
> > >
> > > If there's a way to do this without running executable code that had to pass
> > > a previous execveat() check, then yeah, it's not effective (e.g. a Python
> > > interpreter that *doesn't* enforce execveat() is a trivial way to do it).
> > >
> > > Once arbitrary code is running, all bets are off. So long as all arbitrary
> > > code is being checked itself, it's allowed to do things that would bypass
> > > later checks (and it's up to whoever audited it in the first place to
> > > prevent this by not giving it the special mark that allows it to pass the
> > > check).
> >
> We will want to define what is considered as "arbitrary code is running"
> 
> Using an example of ROP, attackers change the return address in stack,
> e.g. direct the execution flow to a gauge to call "ld.so /tmp/a.out",
> do you consider "arbitrary code is running" when stack is overwritten
> ? or after execve() is called.

Yes, ROP is arbitrary code execution (which can be mitigated with CFI).
ROP could be enough to interpret custom commands and create a small
interpreter/VM.

> If it is later, this patch can prevent "ld.so /tmp/a.out".
> 
> > Exactly.  As explained in the patches, one crucial prerequisite is that
> > the executable code is trusted, and the system must provide integrity
> > guarantees.  We cannot do anything without that.  This patches series is
> > a building block to fix a blind spot on Linux systems to be able to
> > fully control executability.
> 
> Even trusted executable can have a bug.

Definitely, but this patch series is dedicated to script execution
control.

> 
> I'm thinking in the context of ChromeOS, where all its system services
> are from trusted partitions, and legit code won't load .so from a
> non-exec mount.  But we want to sandbox those services, so even under
> some kind of ROP attack, the service still won't be able to load .so
> from /tmp. Of course, if an attacker can already write arbitrary
> length of data into the stack, it is probably already a game over.
> 

OK, you want to tie executable file permission to mmap.  That makes
sense if you have a consistent execution model.  This can be enforced by
LSMs.  Contrary to script interpretation which is a full user space
implementation (and then controlled by user space), mmap restrictions
should indeed be enforced by the kernel.

Mickaël Salaün July 18, 2024, 12:24 p.m. UTC | #30

On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > >
> > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > allowed for execution.  The main use case is for script interpreters and
> > > > dynamic linkers to check execution permission according to the kernel's
> > > > security policy. Another use case is to add context to access logs e.g.,
> > > > which script (instead of interpreter) accessed a file.  As any
> > > > executable code, scripts could also use this check [1].
> > > >
> > > > This is different than faccessat(2) which only checks file access
> > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > real execution, user space gets the same error codes.
> > > >
> > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > exec, shared object, script and config file (such as seccomp config),
> >
> > "config file" that contains executable code.
> >
> Is seccomp config  considered as "contains executable code", seccomp
> config is translated into bpf, so maybe yes ? but bpf is running in
> the kernel.

Because seccomp filters alter syscalls, they are similar to code
injection.

> 
> > > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
> > > different use cases:
> > >
> > > execveat clearly has less code change, but that also means: we can't
> > > add logic specific to exec (i.e. logic that can't be applied to
> > > config) for this part (from do_execveat_common to
> > > security_bprm_creds_for_exec) in future.  This would require some
> > > agreement/sign-off, I'm not sure from whom.
> >
> > I'm not sure to follow. We could still add new flags, but for now I
> > don't see use cases.  This patch series is not meant to handle all
> > possible "trust checks", only executable code, which makes sense for the
> > kernel.
> >
> I guess the "configfile" discussion is where I get confused, at one
> point, I think this would become a generic "trust checks" api for
> everything related to "generating executable code", e.g. javascript,
> java code, and more.
> We will want to clearly define the scope of execveat(AT_CHECK)

The line between data and code is blurry.  For instance, a configuration
file can impact the execution flow of a program.  So, where to draw the
line?

It might makes sense to follow the kernel and interpreter semantic: if a
file can be executed by the kernel (e.g. ELF binary, file containing a
shebang, or just configured with binfmt_misc), then this should be
considered as executable code.  This applies to Bash, Python,
Javascript, NodeJS, PE, PHP...  However, we can also make a picture
executable with binfmt_misc.  So, again, where to draw the line?

I'd recommend to think about interaction with the outside, through
function calls, IPCs, syscalls...  For instance, "running" an image
should not lead to reading or writing to arbitrary files, or accessing
the network, but in practice it is legitimate for some file formats...
PostScript is a programming language, but mostly used to draw pictures.
So, again, where to draw the line?

We should follow the principle of least astonishment.  What most users
would expect?  This should follow the *common usage* of executable
files.  At the end, the script interpreters will be patched by security
folks for security reasons.  I think the right question to ask should
be: could this file format be (ab)used to leak or modify arbitrary
files, or to perform arbitrary syscalls?  If the answer is yes, then it
should be checked for executability.  Of course, this excludes bugs
exploited in the file format parser.

I'll extend the next patch series with this rationale.

> 
> > If we want other checks, we'll need to clearly define their semantic and
> > align with the kernel.  faccessat2(2) might be used to check other file
> > properties, but the executable property is not only defined by the file
> > attributes.
> >
> Agreed.
> 
> > >
> > > --------------------------
> > > now looked at user cases (focus on elf for now)
> > >
> > > 1> ld.so /tmp/a.out, /tmp/a.out is on non-exec mount
> > > dynamic linker will first call execveat(fd, AT_CHECK) then execveat(fd)
> > >
> > > 2> execve(/usr/bin/some.out) and some.out has dependency on /tmp/a.so
> > > /usr/bin/some.out will pass AT_CHECK
> > >
> > > 3> execve(usr/bin/some.out) and some.out uses custom /tmp/ld.so
> > > /usr/bin/some.out will pass AT_CHECK, however, it uses a custom
> > > /tmp/ld.so (I assume this is possible  for elf header will set the
> > > path for ld.so because kernel has no knowledge of that, and
> > > binfmt_elf.c allocate memory for ld.so during execveat call)
> > >
> > > 4> dlopen(/tmp/a.so)
> > > I assume dynamic linker will call execveat(AT_CHECK), before map a.so
> > > into memory.
> > >
> > > For case 1>
> > > Alternative solution: Because AT_CHECK is always called, I think we
> > > can avoid the first AT_CHECK call, and check during execveat(fd),
> >
> > There is no need to use AT_CHECK if we're going to call execveat(2) on
> > the same file descriptor.  By design, AT_CHECK is implicit for any
> > execve(2).
> >
> Yes. I realized I was wrong to say that ld.so will call execve() for
> /tmp/a.out, there is no execve() call, otherwise it would have been
> blocked already today.
> The ld.so will  mmap the /tmp/a.out directly.  So case 1 is no
> different than case 2 and 4.  ( the elf objects are mapped to memory
> by dynamic linker.)
> I'm not familiar with dynamic linker, Florian is on this thread, and
> can help to correct me if my guess is wrong.
> 
> > > this means the kernel will enforce SECBIT_EXEC_RESTRICT_FILE = 1, the
> > > benefit is that there is no TOCTOU and save one round trip of syscall
> > > for a succesful execveat() case.
> >
> > As long as user space uses the same file descriptor, there is no TOCTOU.
> >
> > SECBIT_EXEC_RESTRICT_FILE only makes sense for user space: it defines
> > the user space security policy.  The kernel already enforces the same
> > security policy for any execve(2), whatever are the calling process's
> > securebits.
> >
> > >
> > > For case 2>
> > > dynamic linker will call execve(AT_CHECK), then mmap(fd) into memory.
> > > However,  the process can all open then mmap() directly, it seems
> > > minimal effort for an attacker to walk around such a defence from
> > > dynamic linker.
> >
> > Which process?  What do you mean by "can all open then mmap() directly"?
> >
> > In this context the dynamic linker (like its parent processes) is
> > trusted (guaranteed by the system).
> >
> > For case 2, the dynamic linker must check with AT_CHECK all files that
> > will be mapped, which include /usr/bin/some.out and /tmp/a.so
> >
> My point is that the process can work around this by mmap() the file directly.

Yes, see my answer in the other email. The process is trusted.

> 
> > >
> > > Alternative solution:
> > > dynamic linker call AT_CHECK for each .so, kernel will save the state
> > > (associated with fd)
> > > kernel will check fd state at the time of mmap(fd, executable memory)
> > > and enforce SECBIT_EXEC_RESTRICT_FILE = 1
> >
> > The idea with AT_CHECK is that there is no kernel side effect, no extra
> > kernel state, and the semantic is the same as with execve(2).
> >
> > This also enables us to check file's executable permission and ignore
> > it, which is useful in a "permissive mode" when preparing for a
> > migration without breaking a system, or to do extra integrity checks.
> For preparing a migration (detect all violations), this is useful.
> But as a defense mechanism (SECBIT_EXEC_RESTRICT_FILE = 1) , this
> seems to be weak, at least for elf loading case.

We could add more restrictions, but that is outside the scope of this
patch series.

> 
> > BTW, this use case would also be more complex with a new openat2(2) flag
> > like the original O_MAYEXEC.
> >
> > >
> > > Alternative solution 2:
> > > a new syscall to load the .so and enforce the AT_CHECK in kernel
> >
> > A new syscall would be overkill for this feature.  Please see Linus's
> > comment.
> >
> maybe, I was thinking on how to prevent "/tmp/a.o" from getting mmap()
> to executable memory.

OK, this is another story.

> 
> > >
> > > This also means, for the solution to be complete, we might want to
> > > block creation of executable anonymous memory (e.g. by seccomp, ),
> >
> > How seccomp could create anonymous memory in user space?
> > seccomp filters should be treated (and checked with AT_CHECK) as
> > executable code anyway.
> >
> > > unless the user space can harden the creation of  executable anonymous
> > > memory in some way.
> >
> > User space is already in charge of mmapping its own memory.  I don't see
> > what is missing.
> >
> > >
> > > For case 3>
> > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > sure it passes AT_CHECK, before loading it into memory.
> >
> > All ELF dependencies are opened and checked with open_exec(), which
> > perform the main executability checks (with the __FMODE_EXEC flag).
> > Did I miss something?
> >
> I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> The app can choose its own dynamic linker path during build, (maybe
> even statically link one ?)  This is another reason that relying on a
> userspace only is not enough.

The kernel calls open_exec() on all dependencies, including
ld-linux-x86-64.so.2, so these files are checked for executability too.

> 
> > However, we must be careful with programs using the (deprecated)
> > uselib(2). They should also check with AT_CHECK because this syscall
> > opens the shared library without __FMODE_EXEC (similar to a simple file
> > open). See
> > https://lore.kernel.org/all/CAHk-=wiUwRG7LuR=z5sbkFVGQh+7qVB6_1NM0Ny9SVNL1Un4Sw@mail.gmail.com/
> >
> > >
> > > For case 4>
> > > same as case 2.
> > >
> > > Consider those cases: I think:
> > > a> relying purely on userspace for enforcement does't seem to be
> > > effective,  e.g. it is trivial  to call open(), then mmap() it into
> > > executable memory.
> >
> > As Steve explained (and is also explained in the patches), it is trivial
> > if the attacker can already execute its own code, which is too late to
> > enforce any script execution control.
> >
> > > b> if both user space and kernel need to call AT_CHECK, the faccessat
> > > seems to be a better place for AT_CHECK, e.g. kernel can call
> > > do_faccessat(AT_CHECK) and userspace can call faccessat(). This will
> > > avoid complicating the execveat() code path.
> >
> > A previous version of this patches series already patched faccessat(2),
> > but this is not the right place.  faccessat2(2) is dedicated to check
> > file permissions, not executability (e.g. with mount's noexec).
> >
> > >
> > > What do you think ?
> >
> > I think there are some misunderstandings.  Please let me know if it's
> > clearer now.
> >
> I'm still not sure about the user case for dynamic linker (elf
> loading) case. Maybe this patch is more suitable for scripts?

It's suitable for both, but we could add more restriction on mmap
with an (existing) LSM.  The kernel already checks for mount's noexec
when mapping a file, but not for the file permission, which is OK
because it could be bypassed by coping the content of the file and
mprotecting it anyway.  For a consistent memory execution control, all
memory mapping need to be restricted, which is out of scope for this
patch series.

> A detailed user case will help demonstrate the use case for dynamic
> linker, e.g. what kind of app will benefit from
> SECBIT_EXEC_RESTRICT_FILE = 1, what kind of threat model are we
> dealing with , what kind of attack chain we blocked as a result.

I explained that in the patches and in the description of these new
securebits.  Please point which part is not clear.  The full threat
model is simple: the TCB includes the kernel and system's files, which
are integrity-protected, but we don't trust arbitrary data/scripts that
can be written to user-owned files or directly provided to script
interpreters.  As for the ptrace restrictions, the dynamic linker
restrictions helps to avoid trivial bypasses (e.g. with LD_PRELOAD)
with consistent executability checks.

> 
> > >
> > > Thanks
> > > -Jeff
> > >
> > > > With the information that a script interpreter is about to interpret a
> > > > script, an LSM security policy can adjust caller's access rights or log
> > > > execution request as for native script execution (e.g. role transition).
> > > > This is possible thanks to the call to security_bprm_creds_for_exec().
> > > >
> > > > Because LSMs may only change bprm's credentials, use of AT_CHECK with
> > > > current kernel code should not be a security issue (e.g. unexpected role
> > > > transition).  LSMs willing to update the caller's credential could now
> > > > do so when bprm->is_check is set.  Of course, such policy change should
> > > > be in line with the new user space code.
> > > >
> > > > Because AT_CHECK is dedicated to user space interpreters, it doesn't
> > > > make sense for the kernel to parse the checked files, look for
> > > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > > > if the format is unknown.  Because of that, security_bprm_check() is
> > > > never called when AT_CHECK is used.
> > > >
> > > > It should be noted that script interpreters cannot directly use
> > > > execveat(2) (without this new AT_CHECK flag) because this could lead to
> > > > unexpected behaviors e.g., `python script.sh` could lead to Bash being
> > > > executed to interpret the script.  Unlike the kernel, script
> > > > interpreters may just interpret the shebang as a simple comment, which
> > > > should not change for backward compatibility reasons.
> > > >
> > > > Because scripts or libraries files might not currently have the
> > > > executable permission set, or because we might want specific users to be
> > > > allowed to run arbitrary scripts, the following patch provides a dynamic
> > > > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
> > > > SECBIT_SHOULD_EXEC_RESTRICT securebits.
> > > >
> > > > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > > > This patch has been used for more than a decade with customized script
> > > > interpreters.  Some examples can be found here:
> > > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> > > >
> > > > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > > > Cc: Christian Brauner <brauner@kernel.org>
> > > > Cc: Kees Cook <keescook@chromium.org>
> > > > Cc: Paul Moore <paul@paul-moore.com>
> > > > Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> > > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > > Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net
> > > > ---
> > > >
> > > > New design since v18:
> > > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > > > ---
> > > >  fs/exec.c                  |  5 +++--
> > > >  include/linux/binfmts.h    |  7 ++++++-
> > > >  include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++
> > > >  kernel/audit.h             |  1 +
> > > >  kernel/auditsc.c           |  1 +
> > > >  5 files changed, 41 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > index 40073142288f..ea2a1867afdc 100644
> > > > --- a/fs/exec.c
> > > > +++ b/fs/exec.c
> > > > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> > > >                 .lookup_flags = LOOKUP_FOLLOW,
> > > >         };
> > > >
> > > > -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > > +       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
> > > >                 return ERR_PTR(-EINVAL);
> > > >         if (flags & AT_SYMLINK_NOFOLLOW)
> > > >                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> > > > @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
> > > >                 bprm->filename = bprm->fdpath;
> > > >         }
> > > >         bprm->interp = bprm->filename;
> > > > +       bprm->is_check = !!(flags & AT_CHECK);
> > > >
> > > >         retval = bprm_mm_init(bprm);
> > > >         if (!retval)
> > > > @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm)
> > > >
> > > >         /* Set the unchanging part of bprm->cred */
> > > >         retval = security_bprm_creds_for_exec(bprm);
> > > > -       if (retval)
> > > > +       if (retval || bprm->is_check)
> > > >                 goto out;
> > > >
> > > >         retval = exec_binprm(bprm);
> > > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > > > index 70f97f685bff..8ff9c9e33aed 100644
> > > > --- a/include/linux/binfmts.h
> > > > +++ b/include/linux/binfmts.h
> > > > @@ -42,7 +42,12 @@ struct linux_binprm {
> > > >                  * Set when errors can no longer be returned to the
> > > >                  * original userspace.
> > > >                  */
> > > > -               point_of_no_return:1;
> > > > +               point_of_no_return:1,
> > > > +               /*
> > > > +                * Set by user space to check executability according to the
> > > > +                * caller's environment.
> > > > +                */
> > > > +               is_check:1;
> > > >         struct file *executable; /* Executable to pass to the interpreter */
> > > >         struct file *interpreter;
> > > >         struct file *file;
> > > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > > > index c0bcc185fa48..bcd05c59b7df 100644
> > > > --- a/include/uapi/linux/fcntl.h
> > > > +++ b/include/uapi/linux/fcntl.h
> > > > @@ -118,6 +118,36 @@
> > > >  #define AT_HANDLE_FID          AT_REMOVEDIR    /* file handle is needed to
> > > >                                         compare object identity and may not
> > > >                                         be usable to open_by_handle_at(2) */
> > > > +
> > > > +/*
> > > > + * AT_CHECK only performs a check on a regular file and returns 0 if execution
> > > > + * of this file would be allowed, ignoring the file format and then the related
> > > > + * interpreter dependencies (e.g. ELF libraries, script's shebang).  AT_CHECK
> > > > + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling
> > > > + * thread.  See securebits.h documentation.
> > > > + *
> > > > + * Programs should use this check to apply kernel-level checks against files
> > > > + * that are not directly executed by the kernel but directly passed to a user
> > > > + * space interpreter instead.  All files that contain executable code, from the
> > > > + * point of view of the interpreter, should be checked.  The main purpose of
> > > > + * this flag is to improve the security and consistency of an execution
> > > > + * environment to ensure that direct file execution (e.g. ./script.sh) and
> > > > + * indirect file execution (e.g. sh script.sh) lead to the same result.  For
> > > > + * instance, this can be used to check if a file is trustworthy according to
> > > > + * the caller's environment.
> > > > + *
> > > > + * In a secure environment, libraries and any executable dependencies should
> > > > + * also be checked.  For instance dynamic linking should make sure that all
> > > > + * libraries are allowed for execution to avoid trivial bypass (e.g. using
> > > > + * LD_PRELOAD).  For such secure execution environment to make sense, only
> > > > + * trusted code should be executable, which also requires integrity guarantees.
> > > > + *
> > > > + * To avoid race conditions leading to time-of-check to time-of-use issues,
> > > > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
> > > > + * descriptor instead of a path.
> > > > + */
> > > > +#define AT_CHECK               0x10000
> > > > +
> > > >  #if defined(__KERNEL__)
> > > >  #define AT_GETATTR_NOSEC       0x80000000
> > > >  #endif
> > > > diff --git a/kernel/audit.h b/kernel/audit.h
> > > > index a60d2840559e..8ebdabd2ab81 100644
> > > > --- a/kernel/audit.h
> > > > +++ b/kernel/audit.h
> > > > @@ -197,6 +197,7 @@ struct audit_context {
> > > >                 struct open_how openat2;
> > > >                 struct {
> > > >                         int                     argc;
> > > > +                       bool                    is_check;
> > > >                 } execve;
> > > >                 struct {
> > > >                         char                    *name;
> > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > > index 6f0d6fb6523f..b6316e284342 100644
> > > > --- a/kernel/auditsc.c
> > > > +++ b/kernel/auditsc.c
> > > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
> > > >
> > > >         context->type = AUDIT_EXECVE;
> > > >         context->execve.argc = bprm->argc;
> > > > +       context->execve.is_check = bprm->is_check;
> > > >  }
> > > >
> > > >
> > > > --
> > > > 2.45.2
> > > >
> > >
>

James Bottomley July 18, 2024, 1:03 p.m. UTC | #31

On Thu, 2024-07-18 at 14:24 +0200, Mickaël Salaün wrote:
> On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net>
> > wrote:
> > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
[...]
> > > > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK)
> > > > in different use cases:
> > > > 
> > > > execveat clearly has less code change, but that also means: we
> > > > can't add logic specific to exec (i.e. logic that can't be
> > > > applied to config) for this part (from do_execveat_common to
> > > > security_bprm_creds_for_exec) in future.  This would require
> > > > some agreement/sign-off, I'm not sure from whom.
> > > 
> > > I'm not sure to follow. We could still add new flags, but for now
> > > I don't see use cases.  This patch series is not meant to handle
> > > all possible "trust checks", only executable code, which makes
> > > sense for the kernel.
> > > 
> > I guess the "configfile" discussion is where I get confused, at one
> > point, I think this would become a generic "trust checks" api for
> > everything related to "generating executable code", e.g.
> > javascript, java code, and more. We will want to clearly define the
> > scope of execveat(AT_CHECK)
> 
> The line between data and code is blurry.  For instance, a
> configuration file can impact the execution flow of a program.  So,
> where to draw the line?

Having a way to have config files part of the trusted envelope, either
by signing or measurement would be really useful.  The current standard
distro IMA deployment is signed executables, but not signed config
because it's hard to construct a policy that doesn't force the signing
of too many extraneous files (and files which might change often).

> It might makes sense to follow the kernel and interpreter semantic:
> if a file can be executed by the kernel (e.g. ELF binary, file
> containing a shebang, or just configured with binfmt_misc), then this
> should be considered as executable code.  This applies to Bash,
> Python, Javascript, NodeJS, PE, PHP...  However, we can also make a
> picture executable with binfmt_misc.  So, again, where to draw the
> line?

Possibly by making open for config an indication executables can give?
I'm not advocating doing it in this patch, but if we had an open for
config indication, the LSMs could do much finer grained policy,
especially if they knew which executable was trying to open the config
file.  It would allow things like an IMA policy saying if a signed
executable is opening a config file, then that file must also be
signed.

James

enh July 18, 2024, 2:46 p.m. UTC | #32

On Wed, Jul 17, 2024 at 10:08 PM Jeff Xu <jeffxu@google.com> wrote:
>
> On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > >
> > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > allowed for execution.  The main use case is for script interpreters and
> > > > dynamic linkers to check execution permission according to the kernel's
> > > > security policy. Another use case is to add context to access logs e.g.,
> > > > which script (instead of interpreter) accessed a file.  As any
> > > > executable code, scripts could also use this check [1].
> > > >
> > > > This is different than faccessat(2) which only checks file access
> > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > real execution, user space gets the same error codes.
> > > >
> > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > exec, shared object, script and config file (such as seccomp config),
> >
> > "config file" that contains executable code.
> >
> Is seccomp config  considered as "contains executable code", seccomp
> config is translated into bpf, so maybe yes ? but bpf is running in
> the kernel.
>
> > > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
> > > different use cases:
> > >
> > > execveat clearly has less code change, but that also means: we can't
> > > add logic specific to exec (i.e. logic that can't be applied to
> > > config) for this part (from do_execveat_common to
> > > security_bprm_creds_for_exec) in future.  This would require some
> > > agreement/sign-off, I'm not sure from whom.
> >
> > I'm not sure to follow. We could still add new flags, but for now I
> > don't see use cases.  This patch series is not meant to handle all
> > possible "trust checks", only executable code, which makes sense for the
> > kernel.
> >
> I guess the "configfile" discussion is where I get confused, at one
> point, I think this would become a generic "trust checks" api for
> everything related to "generating executable code", e.g. javascript,
> java code, and more.
> We will want to clearly define the scope of execveat(AT_CHECK)
>
> > If we want other checks, we'll need to clearly define their semantic and
> > align with the kernel.  faccessat2(2) might be used to check other file
> > properties, but the executable property is not only defined by the file
> > attributes.
> >
> Agreed.
>
> > >
> > > --------------------------
> > > now looked at user cases (focus on elf for now)
> > >
> > > 1> ld.so /tmp/a.out, /tmp/a.out is on non-exec mount
> > > dynamic linker will first call execveat(fd, AT_CHECK) then execveat(fd)
> > >
> > > 2> execve(/usr/bin/some.out) and some.out has dependency on /tmp/a.so
> > > /usr/bin/some.out will pass AT_CHECK
> > >
> > > 3> execve(usr/bin/some.out) and some.out uses custom /tmp/ld.so
> > > /usr/bin/some.out will pass AT_CHECK, however, it uses a custom
> > > /tmp/ld.so (I assume this is possible  for elf header will set the
> > > path for ld.so because kernel has no knowledge of that, and
> > > binfmt_elf.c allocate memory for ld.so during execveat call)
> > >
> > > 4> dlopen(/tmp/a.so)
> > > I assume dynamic linker will call execveat(AT_CHECK), before map a.so
> > > into memory.
> > >
> > > For case 1>
> > > Alternative solution: Because AT_CHECK is always called, I think we
> > > can avoid the first AT_CHECK call, and check during execveat(fd),
> >
> > There is no need to use AT_CHECK if we're going to call execveat(2) on
> > the same file descriptor.  By design, AT_CHECK is implicit for any
> > execve(2).
> >
> Yes. I realized I was wrong to say that ld.so will call execve() for
> /tmp/a.out, there is no execve() call, otherwise it would have been
> blocked already today.
> The ld.so will  mmap the /tmp/a.out directly.  So case 1 is no
> different than case 2 and 4.  ( the elf objects are mapped to memory
> by dynamic linker.)
> I'm not familiar with dynamic linker, Florian is on this thread, and
> can help to correct me if my guess is wrong.

for Android, this has been the nail in the coffin of previous attempts
to disallow running code from non-trusted filesystems --- instead of
execing /tmp/a.out, the attacker just execs the linker with /tmp/a.out
as an argument. people are doing this already in some cases, because
we already have ineffectual "barriers" in place. [the usual argument
for doing such things anyway is "it makes it harder to be doing this
by _accident_".]

the other workaround for the attacker is to copy and paste the entire
dynamic linker source and change the bits they don't like :-) (if
you're thinking "is that a thing?", yes, so much so that the idea has
been independently reinvented multiple times by several major legit
apps and by basically every piece of DRM middleware. which is why --
although i'm excited by mseal(2) -- i expect to face significant
challenges rolling it out in Android _especially_ in places like
"dynamic linker internal data structures" where i've wanted it for
years!)

this proposal feels like it _ought_ to let a defender tighten their
seccomp filter to require a "safe" fd if i'm using mmap() with an fd,
but in practice as long as JITs exist i can always just copy code into
a non-fd-backed mmap() region. and -- from the perspective of Android,
where all "apps" are code loaded into a Java runtime -- there's not
much getting away from JITs. (and last i looked, ART -- Android's Java
runtime -- uses memfd() for the JIT cache.)

> > > this means the kernel will enforce SECBIT_EXEC_RESTRICT_FILE = 1, the
> > > benefit is that there is no TOCTOU and save one round trip of syscall
> > > for a succesful execveat() case.
> >
> > As long as user space uses the same file descriptor, there is no TOCTOU.
> >
> > SECBIT_EXEC_RESTRICT_FILE only makes sense for user space: it defines
> > the user space security policy.  The kernel already enforces the same
> > security policy for any execve(2), whatever are the calling process's
> > securebits.
> >
> > >
> > > For case 2>
> > > dynamic linker will call execve(AT_CHECK), then mmap(fd) into memory.
> > > However,  the process can all open then mmap() directly, it seems
> > > minimal effort for an attacker to walk around such a defence from
> > > dynamic linker.
> >
> > Which process?  What do you mean by "can all open then mmap() directly"?
> >
> > In this context the dynamic linker (like its parent processes) is
> > trusted (guaranteed by the system).
> >
> > For case 2, the dynamic linker must check with AT_CHECK all files that
> > will be mapped, which include /usr/bin/some.out and /tmp/a.so
> >
> My point is that the process can work around this by mmap() the file directly.
>
> > >
> > > Alternative solution:
> > > dynamic linker call AT_CHECK for each .so, kernel will save the state
> > > (associated with fd)
> > > kernel will check fd state at the time of mmap(fd, executable memory)
> > > and enforce SECBIT_EXEC_RESTRICT_FILE = 1
> >
> > The idea with AT_CHECK is that there is no kernel side effect, no extra
> > kernel state, and the semantic is the same as with execve(2).
> >
> > This also enables us to check file's executable permission and ignore
> > it, which is useful in a "permissive mode" when preparing for a
> > migration without breaking a system, or to do extra integrity checks.
> For preparing a migration (detect all violations), this is useful.
> But as a defense mechanism (SECBIT_EXEC_RESTRICT_FILE = 1) , this
> seems to be weak, at least for elf loading case.
>
> > BTW, this use case would also be more complex with a new openat2(2) flag
> > like the original O_MAYEXEC.
> >
> > >
> > > Alternative solution 2:
> > > a new syscall to load the .so and enforce the AT_CHECK in kernel
> >
> > A new syscall would be overkill for this feature.  Please see Linus's
> > comment.
> >
> maybe, I was thinking on how to prevent "/tmp/a.o" from getting mmap()
> to executable memory.
>
> > >
> > > This also means, for the solution to be complete, we might want to
> > > block creation of executable anonymous memory (e.g. by seccomp, ),
> >
> > How seccomp could create anonymous memory in user space?
> > seccomp filters should be treated (and checked with AT_CHECK) as
> > executable code anyway.
> >
> > > unless the user space can harden the creation of  executable anonymous
> > > memory in some way.
> >
> > User space is already in charge of mmapping its own memory.  I don't see
> > what is missing.
> >
> > >
> > > For case 3>
> > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > sure it passes AT_CHECK, before loading it into memory.
> >
> > All ELF dependencies are opened and checked with open_exec(), which
> > perform the main executability checks (with the __FMODE_EXEC flag).
> > Did I miss something?
> >
> I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> The app can choose its own dynamic linker path during build, (maybe
> even statically link one ?)  This is another reason that relying on a
> userspace only is not enough.
>
> > However, we must be careful with programs using the (deprecated)
> > uselib(2). They should also check with AT_CHECK because this syscall
> > opens the shared library without __FMODE_EXEC (similar to a simple file
> > open). See
> > https://lore.kernel.org/all/CAHk-=wiUwRG7LuR=z5sbkFVGQh+7qVB6_1NM0Ny9SVNL1Un4Sw@mail.gmail.com/
> >
> > >
> > > For case 4>
> > > same as case 2.
> > >
> > > Consider those cases: I think:
> > > a> relying purely on userspace for enforcement does't seem to be
> > > effective,  e.g. it is trivial  to call open(), then mmap() it into
> > > executable memory.
> >
> > As Steve explained (and is also explained in the patches), it is trivial
> > if the attacker can already execute its own code, which is too late to
> > enforce any script execution control.
> >
> > > b> if both user space and kernel need to call AT_CHECK, the faccessat
> > > seems to be a better place for AT_CHECK, e.g. kernel can call
> > > do_faccessat(AT_CHECK) and userspace can call faccessat(). This will
> > > avoid complicating the execveat() code path.
> >
> > A previous version of this patches series already patched faccessat(2),
> > but this is not the right place.  faccessat2(2) is dedicated to check
> > file permissions, not executability (e.g. with mount's noexec).
> >
> > >
> > > What do you think ?
> >
> > I think there are some misunderstandings.  Please let me know if it's
> > clearer now.
> >
> I'm still not sure about the user case for dynamic linker (elf
> loading) case. Maybe this patch is more suitable for scripts?
> A detailed user case will help demonstrate the use case for dynamic
> linker, e.g. what kind of app will benefit from
> SECBIT_EXEC_RESTRICT_FILE = 1, what kind of threat model are we
> dealing with , what kind of attack chain we blocked as a result.
>
> > >
> > > Thanks
> > > -Jeff
> > >
> > > > With the information that a script interpreter is about to interpret a
> > > > script, an LSM security policy can adjust caller's access rights or log
> > > > execution request as for native script execution (e.g. role transition).
> > > > This is possible thanks to the call to security_bprm_creds_for_exec().
> > > >
> > > > Because LSMs may only change bprm's credentials, use of AT_CHECK with
> > > > current kernel code should not be a security issue (e.g. unexpected role
> > > > transition).  LSMs willing to update the caller's credential could now
> > > > do so when bprm->is_check is set.  Of course, such policy change should
> > > > be in line with the new user space code.
> > > >
> > > > Because AT_CHECK is dedicated to user space interpreters, it doesn't
> > > > make sense for the kernel to parse the checked files, look for
> > > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > > > if the format is unknown.  Because of that, security_bprm_check() is
> > > > never called when AT_CHECK is used.
> > > >
> > > > It should be noted that script interpreters cannot directly use
> > > > execveat(2) (without this new AT_CHECK flag) because this could lead to
> > > > unexpected behaviors e.g., `python script.sh` could lead to Bash being
> > > > executed to interpret the script.  Unlike the kernel, script
> > > > interpreters may just interpret the shebang as a simple comment, which
> > > > should not change for backward compatibility reasons.
> > > >
> > > > Because scripts or libraries files might not currently have the
> > > > executable permission set, or because we might want specific users to be
> > > > allowed to run arbitrary scripts, the following patch provides a dynamic
> > > > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
> > > > SECBIT_SHOULD_EXEC_RESTRICT securebits.
> > > >
> > > > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > > > This patch has been used for more than a decade with customized script
> > > > interpreters.  Some examples can be found here:
> > > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> > > >
> > > > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > > > Cc: Christian Brauner <brauner@kernel.org>
> > > > Cc: Kees Cook <keescook@chromium.org>
> > > > Cc: Paul Moore <paul@paul-moore.com>
> > > > Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> > > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > > Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net
> > > > ---
> > > >
> > > > New design since v18:
> > > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > > > ---
> > > >  fs/exec.c                  |  5 +++--
> > > >  include/linux/binfmts.h    |  7 ++++++-
> > > >  include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++
> > > >  kernel/audit.h             |  1 +
> > > >  kernel/auditsc.c           |  1 +
> > > >  5 files changed, 41 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > index 40073142288f..ea2a1867afdc 100644
> > > > --- a/fs/exec.c
> > > > +++ b/fs/exec.c
> > > > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> > > >                 .lookup_flags = LOOKUP_FOLLOW,
> > > >         };
> > > >
> > > > -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > > +       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
> > > >                 return ERR_PTR(-EINVAL);
> > > >         if (flags & AT_SYMLINK_NOFOLLOW)
> > > >                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> > > > @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
> > > >                 bprm->filename = bprm->fdpath;
> > > >         }
> > > >         bprm->interp = bprm->filename;
> > > > +       bprm->is_check = !!(flags & AT_CHECK);
> > > >
> > > >         retval = bprm_mm_init(bprm);
> > > >         if (!retval)
> > > > @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm)
> > > >
> > > >         /* Set the unchanging part of bprm->cred */
> > > >         retval = security_bprm_creds_for_exec(bprm);
> > > > -       if (retval)
> > > > +       if (retval || bprm->is_check)
> > > >                 goto out;
> > > >
> > > >         retval = exec_binprm(bprm);
> > > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > > > index 70f97f685bff..8ff9c9e33aed 100644
> > > > --- a/include/linux/binfmts.h
> > > > +++ b/include/linux/binfmts.h
> > > > @@ -42,7 +42,12 @@ struct linux_binprm {
> > > >                  * Set when errors can no longer be returned to the
> > > >                  * original userspace.
> > > >                  */
> > > > -               point_of_no_return:1;
> > > > +               point_of_no_return:1,
> > > > +               /*
> > > > +                * Set by user space to check executability according to the
> > > > +                * caller's environment.
> > > > +                */
> > > > +               is_check:1;
> > > >         struct file *executable; /* Executable to pass to the interpreter */
> > > >         struct file *interpreter;
> > > >         struct file *file;
> > > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > > > index c0bcc185fa48..bcd05c59b7df 100644
> > > > --- a/include/uapi/linux/fcntl.h
> > > > +++ b/include/uapi/linux/fcntl.h
> > > > @@ -118,6 +118,36 @@
> > > >  #define AT_HANDLE_FID          AT_REMOVEDIR    /* file handle is needed to
> > > >                                         compare object identity and may not
> > > >                                         be usable to open_by_handle_at(2) */
> > > > +
> > > > +/*
> > > > + * AT_CHECK only performs a check on a regular file and returns 0 if execution
> > > > + * of this file would be allowed, ignoring the file format and then the related
> > > > + * interpreter dependencies (e.g. ELF libraries, script's shebang).  AT_CHECK
> > > > + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling
> > > > + * thread.  See securebits.h documentation.
> > > > + *
> > > > + * Programs should use this check to apply kernel-level checks against files
> > > > + * that are not directly executed by the kernel but directly passed to a user
> > > > + * space interpreter instead.  All files that contain executable code, from the
> > > > + * point of view of the interpreter, should be checked.  The main purpose of
> > > > + * this flag is to improve the security and consistency of an execution
> > > > + * environment to ensure that direct file execution (e.g. ./script.sh) and
> > > > + * indirect file execution (e.g. sh script.sh) lead to the same result.  For
> > > > + * instance, this can be used to check if a file is trustworthy according to
> > > > + * the caller's environment.
> > > > + *
> > > > + * In a secure environment, libraries and any executable dependencies should
> > > > + * also be checked.  For instance dynamic linking should make sure that all
> > > > + * libraries are allowed for execution to avoid trivial bypass (e.g. using
> > > > + * LD_PRELOAD).  For such secure execution environment to make sense, only
> > > > + * trusted code should be executable, which also requires integrity guarantees.
> > > > + *
> > > > + * To avoid race conditions leading to time-of-check to time-of-use issues,
> > > > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
> > > > + * descriptor instead of a path.
> > > > + */
> > > > +#define AT_CHECK               0x10000
> > > > +
> > > >  #if defined(__KERNEL__)
> > > >  #define AT_GETATTR_NOSEC       0x80000000
> > > >  #endif
> > > > diff --git a/kernel/audit.h b/kernel/audit.h
> > > > index a60d2840559e..8ebdabd2ab81 100644
> > > > --- a/kernel/audit.h
> > > > +++ b/kernel/audit.h
> > > > @@ -197,6 +197,7 @@ struct audit_context {
> > > >                 struct open_how openat2;
> > > >                 struct {
> > > >                         int                     argc;
> > > > +                       bool                    is_check;
> > > >                 } execve;
> > > >                 struct {
> > > >                         char                    *name;
> > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > > index 6f0d6fb6523f..b6316e284342 100644
> > > > --- a/kernel/auditsc.c
> > > > +++ b/kernel/auditsc.c
> > > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
> > > >
> > > >         context->type = AUDIT_EXECVE;
> > > >         context->execve.argc = bprm->argc;
> > > > +       context->execve.is_check = bprm->is_check;
> > > >  }
> > > >
> > > >
> > > > --
> > > > 2.45.2
> > > >
> > >

Mickaël Salaün July 18, 2024, 3:35 p.m. UTC | #33

On Thu, Jul 18, 2024 at 09:03:36AM -0400, James Bottomley wrote:
> On Thu, 2024-07-18 at 14:24 +0200, Mickaël Salaün wrote:
> > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net>
> > > wrote:
> > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> [...]
> > > > > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK)
> > > > > in different use cases:
> > > > > 
> > > > > execveat clearly has less code change, but that also means: we
> > > > > can't add logic specific to exec (i.e. logic that can't be
> > > > > applied to config) for this part (from do_execveat_common to
> > > > > security_bprm_creds_for_exec) in future.  This would require
> > > > > some agreement/sign-off, I'm not sure from whom.
> > > > 
> > > > I'm not sure to follow. We could still add new flags, but for now
> > > > I don't see use cases.  This patch series is not meant to handle
> > > > all possible "trust checks", only executable code, which makes
> > > > sense for the kernel.
> > > > 
> > > I guess the "configfile" discussion is where I get confused, at one
> > > point, I think this would become a generic "trust checks" api for
> > > everything related to "generating executable code", e.g.
> > > javascript, java code, and more. We will want to clearly define the
> > > scope of execveat(AT_CHECK)
> > 
> > The line between data and code is blurry.  For instance, a
> > configuration file can impact the execution flow of a program.  So,
> > where to draw the line?
> 
> Having a way to have config files part of the trusted envelope, either
> by signing or measurement would be really useful.  The current standard
> distro IMA deployment is signed executables, but not signed config
> because it's hard to construct a policy that doesn't force the signing
> of too many extraneous files (and files which might change often).
> 
> > It might makes sense to follow the kernel and interpreter semantic:
> > if a file can be executed by the kernel (e.g. ELF binary, file
> > containing a shebang, or just configured with binfmt_misc), then this
> > should be considered as executable code.  This applies to Bash,
> > Python, Javascript, NodeJS, PE, PHP...  However, we can also make a
> > picture executable with binfmt_misc.  So, again, where to draw the
> > line?
> 
> Possibly by making open for config an indication executables can give?
> I'm not advocating doing it in this patch, but if we had an open for
> config indication, the LSMs could do much finer grained policy,
> especially if they knew which executable was trying to open the config
> file.  It would allow things like an IMA policy saying if a signed
> executable is opening a config file, then that file must also be
> signed.

Checking configuration could be a next step, but not with this patch
series.  FYI, the previous version was a (too) generic syscall:
https://lore.kernel.org/all/20220104155024.48023-1-mic@digikod.net/
One of the main concern was alignment with kernel semantic.  For now,
let's focus on script execution control.

> 
> James
>

Mickaël Salaün July 18, 2024, 3:35 p.m. UTC | #34

On Thu, Jul 18, 2024 at 10:46:54AM -0400, enh wrote:
> On Wed, Jul 17, 2024 at 10:08 PM Jeff Xu <jeffxu@google.com> wrote:
> >
> > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > >
> > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > executable code, scripts could also use this check [1].
> > > > >
> > > > > This is different than faccessat(2) which only checks file access
> > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > real execution, user space gets the same error codes.
> > > > >
> > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > exec, shared object, script and config file (such as seccomp config),
> > >
> > > "config file" that contains executable code.
> > >
> > Is seccomp config  considered as "contains executable code", seccomp
> > config is translated into bpf, so maybe yes ? but bpf is running in
> > the kernel.
> >
> > > > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
> > > > different use cases:
> > > >
> > > > execveat clearly has less code change, but that also means: we can't
> > > > add logic specific to exec (i.e. logic that can't be applied to
> > > > config) for this part (from do_execveat_common to
> > > > security_bprm_creds_for_exec) in future.  This would require some
> > > > agreement/sign-off, I'm not sure from whom.
> > >
> > > I'm not sure to follow. We could still add new flags, but for now I
> > > don't see use cases.  This patch series is not meant to handle all
> > > possible "trust checks", only executable code, which makes sense for the
> > > kernel.
> > >
> > I guess the "configfile" discussion is where I get confused, at one
> > point, I think this would become a generic "trust checks" api for
> > everything related to "generating executable code", e.g. javascript,
> > java code, and more.
> > We will want to clearly define the scope of execveat(AT_CHECK)
> >
> > > If we want other checks, we'll need to clearly define their semantic and
> > > align with the kernel.  faccessat2(2) might be used to check other file
> > > properties, but the executable property is not only defined by the file
> > > attributes.
> > >
> > Agreed.
> >
> > > >
> > > > --------------------------
> > > > now looked at user cases (focus on elf for now)
> > > >
> > > > 1> ld.so /tmp/a.out, /tmp/a.out is on non-exec mount
> > > > dynamic linker will first call execveat(fd, AT_CHECK) then execveat(fd)
> > > >
> > > > 2> execve(/usr/bin/some.out) and some.out has dependency on /tmp/a.so
> > > > /usr/bin/some.out will pass AT_CHECK
> > > >
> > > > 3> execve(usr/bin/some.out) and some.out uses custom /tmp/ld.so
> > > > /usr/bin/some.out will pass AT_CHECK, however, it uses a custom
> > > > /tmp/ld.so (I assume this is possible  for elf header will set the
> > > > path for ld.so because kernel has no knowledge of that, and
> > > > binfmt_elf.c allocate memory for ld.so during execveat call)
> > > >
> > > > 4> dlopen(/tmp/a.so)
> > > > I assume dynamic linker will call execveat(AT_CHECK), before map a.so
> > > > into memory.
> > > >
> > > > For case 1>
> > > > Alternative solution: Because AT_CHECK is always called, I think we
> > > > can avoid the first AT_CHECK call, and check during execveat(fd),
> > >
> > > There is no need to use AT_CHECK if we're going to call execveat(2) on
> > > the same file descriptor.  By design, AT_CHECK is implicit for any
> > > execve(2).
> > >
> > Yes. I realized I was wrong to say that ld.so will call execve() for
> > /tmp/a.out, there is no execve() call, otherwise it would have been
> > blocked already today.
> > The ld.so will  mmap the /tmp/a.out directly.  So case 1 is no
> > different than case 2 and 4.  ( the elf objects are mapped to memory
> > by dynamic linker.)
> > I'm not familiar with dynamic linker, Florian is on this thread, and
> > can help to correct me if my guess is wrong.
> 
> for Android, this has been the nail in the coffin of previous attempts
> to disallow running code from non-trusted filesystems --- instead of
> execing /tmp/a.out, the attacker just execs the linker with /tmp/a.out
> as an argument. people are doing this already in some cases, because
> we already have ineffectual "barriers" in place. [the usual argument
> for doing such things anyway is "it makes it harder to be doing this
> by _accident_".]

This AT_CHECK and related securebits should cover this case.

> 
> the other workaround for the attacker is to copy and paste the entire
> dynamic linker source and change the bits they don't like :-) (if
> you're thinking "is that a thing?", yes, so much so that the idea has
> been independently reinvented multiple times by several major legit
> apps and by basically every piece of DRM middleware. which is why --
> although i'm excited by mseal(2) -- i expect to face significant
> challenges rolling it out in Android _especially_ in places like
> "dynamic linker internal data structures" where i've wanted it for
> years!)
> 
> this proposal feels like it _ought_ to let a defender tighten their
> seccomp filter to require a "safe" fd if i'm using mmap() with an fd,
> but in practice as long as JITs exist i can always just copy code into
> a non-fd-backed mmap() region. and -- from the perspective of Android,
> where all "apps" are code loaded into a Java runtime -- there's not
> much getting away from JITs. (and last i looked, ART -- Android's Java
> runtime -- uses memfd() for the JIT cache.)

Using the feature brought by this patch series makes sense for trusted
executables willing to enforce a security policy.  Untrusted ones should
be sandboxed, which is the case for Android apps.

> 
> > > > this means the kernel will enforce SECBIT_EXEC_RESTRICT_FILE = 1, the
> > > > benefit is that there is no TOCTOU and save one round trip of syscall
> > > > for a succesful execveat() case.
> > >
> > > As long as user space uses the same file descriptor, there is no TOCTOU.
> > >
> > > SECBIT_EXEC_RESTRICT_FILE only makes sense for user space: it defines
> > > the user space security policy.  The kernel already enforces the same
> > > security policy for any execve(2), whatever are the calling process's
> > > securebits.
> > >
> > > >
> > > > For case 2>
> > > > dynamic linker will call execve(AT_CHECK), then mmap(fd) into memory.
> > > > However,  the process can all open then mmap() directly, it seems
> > > > minimal effort for an attacker to walk around such a defence from
> > > > dynamic linker.
> > >
> > > Which process?  What do you mean by "can all open then mmap() directly"?
> > >
> > > In this context the dynamic linker (like its parent processes) is
> > > trusted (guaranteed by the system).
> > >
> > > For case 2, the dynamic linker must check with AT_CHECK all files that
> > > will be mapped, which include /usr/bin/some.out and /tmp/a.so
> > >
> > My point is that the process can work around this by mmap() the file directly.
> >
> > > >
> > > > Alternative solution:
> > > > dynamic linker call AT_CHECK for each .so, kernel will save the state
> > > > (associated with fd)
> > > > kernel will check fd state at the time of mmap(fd, executable memory)
> > > > and enforce SECBIT_EXEC_RESTRICT_FILE = 1
> > >
> > > The idea with AT_CHECK is that there is no kernel side effect, no extra
> > > kernel state, and the semantic is the same as with execve(2).
> > >
> > > This also enables us to check file's executable permission and ignore
> > > it, which is useful in a "permissive mode" when preparing for a
> > > migration without breaking a system, or to do extra integrity checks.
> > For preparing a migration (detect all violations), this is useful.
> > But as a defense mechanism (SECBIT_EXEC_RESTRICT_FILE = 1) , this
> > seems to be weak, at least for elf loading case.
> >
> > > BTW, this use case would also be more complex with a new openat2(2) flag
> > > like the original O_MAYEXEC.
> > >
> > > >
> > > > Alternative solution 2:
> > > > a new syscall to load the .so and enforce the AT_CHECK in kernel
> > >
> > > A new syscall would be overkill for this feature.  Please see Linus's
> > > comment.
> > >
> > maybe, I was thinking on how to prevent "/tmp/a.o" from getting mmap()
> > to executable memory.
> >
> > > >
> > > > This also means, for the solution to be complete, we might want to
> > > > block creation of executable anonymous memory (e.g. by seccomp, ),
> > >
> > > How seccomp could create anonymous memory in user space?
> > > seccomp filters should be treated (and checked with AT_CHECK) as
> > > executable code anyway.
> > >
> > > > unless the user space can harden the creation of  executable anonymous
> > > > memory in some way.
> > >
> > > User space is already in charge of mmapping its own memory.  I don't see
> > > what is missing.
> > >
> > > >
> > > > For case 3>
> > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > > sure it passes AT_CHECK, before loading it into memory.
> > >
> > > All ELF dependencies are opened and checked with open_exec(), which
> > > perform the main executability checks (with the __FMODE_EXEC flag).
> > > Did I miss something?
> > >
> > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> > The app can choose its own dynamic linker path during build, (maybe
> > even statically link one ?)  This is another reason that relying on a
> > userspace only is not enough.
> >
> > > However, we must be careful with programs using the (deprecated)
> > > uselib(2). They should also check with AT_CHECK because this syscall
> > > opens the shared library without __FMODE_EXEC (similar to a simple file
> > > open). See
> > > https://lore.kernel.org/all/CAHk-=wiUwRG7LuR=z5sbkFVGQh+7qVB6_1NM0Ny9SVNL1Un4Sw@mail.gmail.com/
> > >
> > > >
> > > > For case 4>
> > > > same as case 2.
> > > >
> > > > Consider those cases: I think:
> > > > a> relying purely on userspace for enforcement does't seem to be
> > > > effective,  e.g. it is trivial  to call open(), then mmap() it into
> > > > executable memory.
> > >
> > > As Steve explained (and is also explained in the patches), it is trivial
> > > if the attacker can already execute its own code, which is too late to
> > > enforce any script execution control.
> > >
> > > > b> if both user space and kernel need to call AT_CHECK, the faccessat
> > > > seems to be a better place for AT_CHECK, e.g. kernel can call
> > > > do_faccessat(AT_CHECK) and userspace can call faccessat(). This will
> > > > avoid complicating the execveat() code path.
> > >
> > > A previous version of this patches series already patched faccessat(2),
> > > but this is not the right place.  faccessat2(2) is dedicated to check
> > > file permissions, not executability (e.g. with mount's noexec).
> > >
> > > >
> > > > What do you think ?
> > >
> > > I think there are some misunderstandings.  Please let me know if it's
> > > clearer now.
> > >
> > I'm still not sure about the user case for dynamic linker (elf
> > loading) case. Maybe this patch is more suitable for scripts?
> > A detailed user case will help demonstrate the use case for dynamic
> > linker, e.g. what kind of app will benefit from
> > SECBIT_EXEC_RESTRICT_FILE = 1, what kind of threat model are we
> > dealing with , what kind of attack chain we blocked as a result.
> >
> > > >
> > > > Thanks
> > > > -Jeff
> > > >
> > > > > With the information that a script interpreter is about to interpret a
> > > > > script, an LSM security policy can adjust caller's access rights or log
> > > > > execution request as for native script execution (e.g. role transition).
> > > > > This is possible thanks to the call to security_bprm_creds_for_exec().
> > > > >
> > > > > Because LSMs may only change bprm's credentials, use of AT_CHECK with
> > > > > current kernel code should not be a security issue (e.g. unexpected role
> > > > > transition).  LSMs willing to update the caller's credential could now
> > > > > do so when bprm->is_check is set.  Of course, such policy change should
> > > > > be in line with the new user space code.
> > > > >
> > > > > Because AT_CHECK is dedicated to user space interpreters, it doesn't
> > > > > make sense for the kernel to parse the checked files, look for
> > > > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > > > > if the format is unknown.  Because of that, security_bprm_check() is
> > > > > never called when AT_CHECK is used.
> > > > >
> > > > > It should be noted that script interpreters cannot directly use
> > > > > execveat(2) (without this new AT_CHECK flag) because this could lead to
> > > > > unexpected behaviors e.g., `python script.sh` could lead to Bash being
> > > > > executed to interpret the script.  Unlike the kernel, script
> > > > > interpreters may just interpret the shebang as a simple comment, which
> > > > > should not change for backward compatibility reasons.
> > > > >
> > > > > Because scripts or libraries files might not currently have the
> > > > > executable permission set, or because we might want specific users to be
> > > > > allowed to run arbitrary scripts, the following patch provides a dynamic
> > > > > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
> > > > > SECBIT_SHOULD_EXEC_RESTRICT securebits.
> > > > >
> > > > > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > > > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > > > > This patch has been used for more than a decade with customized script
> > > > > interpreters.  Some examples can be found here:
> > > > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> > > > >
> > > > > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > > > > Cc: Christian Brauner <brauner@kernel.org>
> > > > > Cc: Kees Cook <keescook@chromium.org>
> > > > > Cc: Paul Moore <paul@paul-moore.com>
> > > > > Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> > > > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > > > Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net
> > > > > ---
> > > > >
> > > > > New design since v18:
> > > > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > > > > ---
> > > > >  fs/exec.c                  |  5 +++--
> > > > >  include/linux/binfmts.h    |  7 ++++++-
> > > > >  include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++
> > > > >  kernel/audit.h             |  1 +
> > > > >  kernel/auditsc.c           |  1 +
> > > > >  5 files changed, 41 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > > index 40073142288f..ea2a1867afdc 100644
> > > > > --- a/fs/exec.c
> > > > > +++ b/fs/exec.c
> > > > > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> > > > >                 .lookup_flags = LOOKUP_FOLLOW,
> > > > >         };
> > > > >
> > > > > -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > > > +       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
> > > > >                 return ERR_PTR(-EINVAL);
> > > > >         if (flags & AT_SYMLINK_NOFOLLOW)
> > > > >                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> > > > > @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
> > > > >                 bprm->filename = bprm->fdpath;
> > > > >         }
> > > > >         bprm->interp = bprm->filename;
> > > > > +       bprm->is_check = !!(flags & AT_CHECK);
> > > > >
> > > > >         retval = bprm_mm_init(bprm);
> > > > >         if (!retval)
> > > > > @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm)
> > > > >
> > > > >         /* Set the unchanging part of bprm->cred */
> > > > >         retval = security_bprm_creds_for_exec(bprm);
> > > > > -       if (retval)
> > > > > +       if (retval || bprm->is_check)
> > > > >                 goto out;
> > > > >
> > > > >         retval = exec_binprm(bprm);
> > > > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > > > > index 70f97f685bff..8ff9c9e33aed 100644
> > > > > --- a/include/linux/binfmts.h
> > > > > +++ b/include/linux/binfmts.h
> > > > > @@ -42,7 +42,12 @@ struct linux_binprm {
> > > > >                  * Set when errors can no longer be returned to the
> > > > >                  * original userspace.
> > > > >                  */
> > > > > -               point_of_no_return:1;
> > > > > +               point_of_no_return:1,
> > > > > +               /*
> > > > > +                * Set by user space to check executability according to the
> > > > > +                * caller's environment.
> > > > > +                */
> > > > > +               is_check:1;
> > > > >         struct file *executable; /* Executable to pass to the interpreter */
> > > > >         struct file *interpreter;
> > > > >         struct file *file;
> > > > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > > > > index c0bcc185fa48..bcd05c59b7df 100644
> > > > > --- a/include/uapi/linux/fcntl.h
> > > > > +++ b/include/uapi/linux/fcntl.h
> > > > > @@ -118,6 +118,36 @@
> > > > >  #define AT_HANDLE_FID          AT_REMOVEDIR    /* file handle is needed to
> > > > >                                         compare object identity and may not
> > > > >                                         be usable to open_by_handle_at(2) */
> > > > > +
> > > > > +/*
> > > > > + * AT_CHECK only performs a check on a regular file and returns 0 if execution
> > > > > + * of this file would be allowed, ignoring the file format and then the related
> > > > > + * interpreter dependencies (e.g. ELF libraries, script's shebang).  AT_CHECK
> > > > > + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling
> > > > > + * thread.  See securebits.h documentation.
> > > > > + *
> > > > > + * Programs should use this check to apply kernel-level checks against files
> > > > > + * that are not directly executed by the kernel but directly passed to a user
> > > > > + * space interpreter instead.  All files that contain executable code, from the
> > > > > + * point of view of the interpreter, should be checked.  The main purpose of
> > > > > + * this flag is to improve the security and consistency of an execution
> > > > > + * environment to ensure that direct file execution (e.g. ./script.sh) and
> > > > > + * indirect file execution (e.g. sh script.sh) lead to the same result.  For
> > > > > + * instance, this can be used to check if a file is trustworthy according to
> > > > > + * the caller's environment.
> > > > > + *
> > > > > + * In a secure environment, libraries and any executable dependencies should
> > > > > + * also be checked.  For instance dynamic linking should make sure that all
> > > > > + * libraries are allowed for execution to avoid trivial bypass (e.g. using
> > > > > + * LD_PRELOAD).  For such secure execution environment to make sense, only
> > > > > + * trusted code should be executable, which also requires integrity guarantees.
> > > > > + *
> > > > > + * To avoid race conditions leading to time-of-check to time-of-use issues,
> > > > > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
> > > > > + * descriptor instead of a path.
> > > > > + */
> > > > > +#define AT_CHECK               0x10000
> > > > > +
> > > > >  #if defined(__KERNEL__)
> > > > >  #define AT_GETATTR_NOSEC       0x80000000
> > > > >  #endif
> > > > > diff --git a/kernel/audit.h b/kernel/audit.h
> > > > > index a60d2840559e..8ebdabd2ab81 100644
> > > > > --- a/kernel/audit.h
> > > > > +++ b/kernel/audit.h
> > > > > @@ -197,6 +197,7 @@ struct audit_context {
> > > > >                 struct open_how openat2;
> > > > >                 struct {
> > > > >                         int                     argc;
> > > > > +                       bool                    is_check;
> > > > >                 } execve;
> > > > >                 struct {
> > > > >                         char                    *name;
> > > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > > > index 6f0d6fb6523f..b6316e284342 100644
> > > > > --- a/kernel/auditsc.c
> > > > > +++ b/kernel/auditsc.c
> > > > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
> > > > >
> > > > >         context->type = AUDIT_EXECVE;
> > > > >         context->execve.argc = bprm->argc;
> > > > > +       context->execve.is_check = bprm->is_check;
> > > > >  }
> > > > >
> > > > >
> > > > > --
> > > > > 2.45.2
> > > > >
> > > >
>

Jeff Xu July 18, 2024, 10:54 p.m. UTC | #35

On Thu, Jul 18, 2024 at 5:23 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Wed, Jul 17, 2024 at 06:51:11PM -0700, Jeff Xu wrote:
> > On Wed, Jul 17, 2024 at 3:00 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote:
> > > > On 17/07/2024 07:33, Jeff Xu wrote:
> > > > > Consider those cases: I think:
> > > > > a> relying purely on userspace for enforcement does't seem to be
> > > > > effective,  e.g. it is trivial  to call open(), then mmap() it into
> > > > > executable memory.
> > > >
> > > > If there's a way to do this without running executable code that had to pass
> > > > a previous execveat() check, then yeah, it's not effective (e.g. a Python
> > > > interpreter that *doesn't* enforce execveat() is a trivial way to do it).
> > > >
> > > > Once arbitrary code is running, all bets are off. So long as all arbitrary
> > > > code is being checked itself, it's allowed to do things that would bypass
> > > > later checks (and it's up to whoever audited it in the first place to
> > > > prevent this by not giving it the special mark that allows it to pass the
> > > > check).
> > >
> > We will want to define what is considered as "arbitrary code is running"
> >
> > Using an example of ROP, attackers change the return address in stack,
> > e.g. direct the execution flow to a gauge to call "ld.so /tmp/a.out",
> > do you consider "arbitrary code is running" when stack is overwritten
> > ? or after execve() is called.
>
> Yes, ROP is arbitrary code execution (which can be mitigated with CFI).
> ROP could be enough to interpret custom commands and create a small
> interpreter/VM.
>
> > If it is later, this patch can prevent "ld.so /tmp/a.out".
> >
> > > Exactly.  As explained in the patches, one crucial prerequisite is that
> > > the executable code is trusted, and the system must provide integrity
> > > guarantees.  We cannot do anything without that.  This patches series is
> > > a building block to fix a blind spot on Linux systems to be able to
> > > fully control executability.
> >
> > Even trusted executable can have a bug.
>
> Definitely, but this patch series is dedicated to script execution
> control.
>
> >
> > I'm thinking in the context of ChromeOS, where all its system services
> > are from trusted partitions, and legit code won't load .so from a
> > non-exec mount.  But we want to sandbox those services, so even under
> > some kind of ROP attack, the service still won't be able to load .so
> > from /tmp. Of course, if an attacker can already write arbitrary
> > length of data into the stack, it is probably already a game over.
> >
>
> OK, you want to tie executable file permission to mmap.  That makes
> sense if you have a consistent execution model.  This can be enforced by
> LSMs.  Contrary to script interpretation which is a full user space
> implementation (and then controlled by user space), mmap restrictions
> should indeed be enforced by the kernel.
Ya, that is what I meant. it can be out of scope for this patch.
Indeed, as you point out, this patch is dedicated to script execution
control, and fixing ld.so /tmp/a.out is an extra bonus in addition to
script.

Thanks
-Jeff

Jeff Xu July 19, 2024, 1:29 a.m. UTC | #36

On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > >
> > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > executable code, scripts could also use this check [1].
> > > > >
> > > > > This is different than faccessat(2) which only checks file access
> > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > real execution, user space gets the same error codes.
> > > > >
> > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > exec, shared object, script and config file (such as seccomp config),
> > >
> > > "config file" that contains executable code.
> > >
> > Is seccomp config  considered as "contains executable code", seccomp
> > config is translated into bpf, so maybe yes ? but bpf is running in
> > the kernel.
>
> Because seccomp filters alter syscalls, they are similar to code
> injection.
>
that makes sense.

> >
> > > > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
> > > > different use cases:
> > > >
> > > > execveat clearly has less code change, but that also means: we can't
> > > > add logic specific to exec (i.e. logic that can't be applied to
> > > > config) for this part (from do_execveat_common to
> > > > security_bprm_creds_for_exec) in future.  This would require some
> > > > agreement/sign-off, I'm not sure from whom.
> > >
> > > I'm not sure to follow. We could still add new flags, but for now I
> > > don't see use cases.  This patch series is not meant to handle all
> > > possible "trust checks", only executable code, which makes sense for the
> > > kernel.
> > >
> > I guess the "configfile" discussion is where I get confused, at one
> > point, I think this would become a generic "trust checks" api for
> > everything related to "generating executable code", e.g. javascript,
> > java code, and more.
> > We will want to clearly define the scope of execveat(AT_CHECK)
>
> The line between data and code is blurry.  For instance, a configuration
> file can impact the execution flow of a program.  So, where to draw the
> line?
>
> It might makes sense to follow the kernel and interpreter semantic: if a
> file can be executed by the kernel (e.g. ELF binary, file containing a
> shebang, or just configured with binfmt_misc), then this should be
> considered as executable code.  This applies to Bash, Python,
> Javascript, NodeJS, PE, PHP...  However, we can also make a picture
> executable with binfmt_misc.  So, again, where to draw the line?
>
> I'd recommend to think about interaction with the outside, through
> function calls, IPCs, syscalls...  For instance, "running" an image
> should not lead to reading or writing to arbitrary files, or accessing
> the network, but in practice it is legitimate for some file formats...
> PostScript is a programming language, but mostly used to draw pictures.
> So, again, where to draw the line?
>
> We should follow the principle of least astonishment.  What most users
> would expect?  This should follow the *common usage* of executable
> files.  At the end, the script interpreters will be patched by security
> folks for security reasons.  I think the right question to ask should
> be: could this file format be (ab)used to leak or modify arbitrary
> files, or to perform arbitrary syscalls?  If the answer is yes, then it
> should be checked for executability.  Of course, this excludes bugs
> exploited in the file format parser.
>
> I'll extend the next patch series with this rationale.
>
> >
> > > If we want other checks, we'll need to clearly define their semantic and
> > > align with the kernel.  faccessat2(2) might be used to check other file
> > > properties, but the executable property is not only defined by the file
> > > attributes.
> > >
> > Agreed.
> >
> > > >
> > > > --------------------------
> > > > now looked at user cases (focus on elf for now)
> > > >
> > > > 1> ld.so /tmp/a.out, /tmp/a.out is on non-exec mount
> > > > dynamic linker will first call execveat(fd, AT_CHECK) then execveat(fd)
> > > >
> > > > 2> execve(/usr/bin/some.out) and some.out has dependency on /tmp/a.so
> > > > /usr/bin/some.out will pass AT_CHECK
> > > >
> > > > 3> execve(usr/bin/some.out) and some.out uses custom /tmp/ld.so
> > > > /usr/bin/some.out will pass AT_CHECK, however, it uses a custom
> > > > /tmp/ld.so (I assume this is possible  for elf header will set the
> > > > path for ld.so because kernel has no knowledge of that, and
> > > > binfmt_elf.c allocate memory for ld.so during execveat call)
> > > >
> > > > 4> dlopen(/tmp/a.so)
> > > > I assume dynamic linker will call execveat(AT_CHECK), before map a.so
> > > > into memory.
> > > >
> > > > For case 1>
> > > > Alternative solution: Because AT_CHECK is always called, I think we
> > > > can avoid the first AT_CHECK call, and check during execveat(fd),
> > >
> > > There is no need to use AT_CHECK if we're going to call execveat(2) on
> > > the same file descriptor.  By design, AT_CHECK is implicit for any
> > > execve(2).
> > >
> > Yes. I realized I was wrong to say that ld.so will call execve() for
> > /tmp/a.out, there is no execve() call, otherwise it would have been
> > blocked already today.
> > The ld.so will  mmap the /tmp/a.out directly.  So case 1 is no
> > different than case 2 and 4.  ( the elf objects are mapped to memory
> > by dynamic linker.)
> > I'm not familiar with dynamic linker, Florian is on this thread, and
> > can help to correct me if my guess is wrong.
> >
> > > > this means the kernel will enforce SECBIT_EXEC_RESTRICT_FILE = 1, the
> > > > benefit is that there is no TOCTOU and save one round trip of syscall
> > > > for a succesful execveat() case.
> > >
> > > As long as user space uses the same file descriptor, there is no TOCTOU.
> > >
> > > SECBIT_EXEC_RESTRICT_FILE only makes sense for user space: it defines
> > > the user space security policy.  The kernel already enforces the same
> > > security policy for any execve(2), whatever are the calling process's
> > > securebits.
> > >
> > > >
> > > > For case 2>
> > > > dynamic linker will call execve(AT_CHECK), then mmap(fd) into memory.
> > > > However,  the process can all open then mmap() directly, it seems
> > > > minimal effort for an attacker to walk around such a defence from
> > > > dynamic linker.
> > >
> > > Which process?  What do you mean by "can all open then mmap() directly"?
> > >
> > > In this context the dynamic linker (like its parent processes) is
> > > trusted (guaranteed by the system).
> > >
> > > For case 2, the dynamic linker must check with AT_CHECK all files that
> > > will be mapped, which include /usr/bin/some.out and /tmp/a.so
> > >
> > My point is that the process can work around this by mmap() the file directly.
>
> Yes, see my answer in the other email. The process is trusted.
>
OK. Let's agree that this is out of scope for this patch series.

> >
> > > >
> > > > Alternative solution:
> > > > dynamic linker call AT_CHECK for each .so, kernel will save the state
> > > > (associated with fd)
> > > > kernel will check fd state at the time of mmap(fd, executable memory)
> > > > and enforce SECBIT_EXEC_RESTRICT_FILE = 1
> > >
> > > The idea with AT_CHECK is that there is no kernel side effect, no extra
> > > kernel state, and the semantic is the same as with execve(2).
> > >
> > > This also enables us to check file's executable permission and ignore
> > > it, which is useful in a "permissive mode" when preparing for a
> > > migration without breaking a system, or to do extra integrity checks.
> > For preparing a migration (detect all violations), this is useful.
> > But as a defense mechanism (SECBIT_EXEC_RESTRICT_FILE = 1) , this
> > seems to be weak, at least for elf loading case.
>
> We could add more restrictions, but that is outside the scope of this
> patch series.
>
Agreed.

> >
> > > BTW, this use case would also be more complex with a new openat2(2) flag
> > > like the original O_MAYEXEC.
> > >
> > > >
> > > > Alternative solution 2:
> > > > a new syscall to load the .so and enforce the AT_CHECK in kernel
> > >
> > > A new syscall would be overkill for this feature.  Please see Linus's
> > > comment.
> > >
> > maybe, I was thinking on how to prevent "/tmp/a.o" from getting mmap()
> > to executable memory.
>
> OK, this is another story.
>
> >
> > > >
> > > > This also means, for the solution to be complete, we might want to
> > > > block creation of executable anonymous memory (e.g. by seccomp, ),
> > >
> > > How seccomp could create anonymous memory in user space?
> > > seccomp filters should be treated (and checked with AT_CHECK) as
> > > executable code anyway.
> > >
> > > > unless the user space can harden the creation of  executable anonymous
> > > > memory in some way.
> > >
> > > User space is already in charge of mmapping its own memory.  I don't see
> > > what is missing.
> > >
> > > >
> > > > For case 3>
> > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > > sure it passes AT_CHECK, before loading it into memory.
> > >
> > > All ELF dependencies are opened and checked with open_exec(), which
> > > perform the main executability checks (with the __FMODE_EXEC flag).
> > > Did I miss something?
> > >
> > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> > The app can choose its own dynamic linker path during build, (maybe
> > even statically link one ?)  This is another reason that relying on a
> > userspace only is not enough.
>
> The kernel calls open_exec() on all dependencies, including
> ld-linux-x86-64.so.2, so these files are checked for executability too.
>
This might not be entirely true. iiuc, kernel  calls open_exec for
open_exec for interpreter, but not all its dependency (e.g. libc.so.6)
load_elf_binary() {
   interpreter = open_exec(elf_interpreter);
}

libc.so.6 is opened and mapped by dynamic linker.
so the call sequence is:
 execve(a.out)
  - open exec(a.out)
  - security_bprm_creds(a.out)
  - open the exec(ld.so)
  - call open_exec() for interruptor (ld.so)
  - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through
the same check and code path as libc.so below ?
  - transfer the control to ld.so)
  - ld.so open (libc.so)
  - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch,
require dynamic linker change.
  - ld.so mmap(libc.so,rx)


> >
> > > However, we must be careful with programs using the (deprecated)
> > > uselib(2). They should also check with AT_CHECK because this syscall
> > > opens the shared library without __FMODE_EXEC (similar to a simple file
> > > open). See
> > > https://lore.kernel.org/all/CAHk-=wiUwRG7LuR=z5sbkFVGQh+7qVB6_1NM0Ny9SVNL1Un4Sw@mail.gmail.com/
> > >
> > > >
> > > > For case 4>
> > > > same as case 2.
> > > >
> > > > Consider those cases: I think:
> > > > a> relying purely on userspace for enforcement does't seem to be
> > > > effective,  e.g. it is trivial  to call open(), then mmap() it into
> > > > executable memory.
> > >
> > > As Steve explained (and is also explained in the patches), it is trivial
> > > if the attacker can already execute its own code, which is too late to
> > > enforce any script execution control.
> > >
> > > > b> if both user space and kernel need to call AT_CHECK, the faccessat
> > > > seems to be a better place for AT_CHECK, e.g. kernel can call
> > > > do_faccessat(AT_CHECK) and userspace can call faccessat(). This will
> > > > avoid complicating the execveat() code path.
> > >
> > > A previous version of this patches series already patched faccessat(2),
> > > but this is not the right place.  faccessat2(2) is dedicated to check
> > > file permissions, not executability (e.g. with mount's noexec).
> > >
> > > >
> > > > What do you think ?
> > >
> > > I think there are some misunderstandings.  Please let me know if it's
> > > clearer now.
> > >
> > I'm still not sure about the user case for dynamic linker (elf
> > loading) case. Maybe this patch is more suitable for scripts?
>
> It's suitable for both, but we could add more restriction on mmap
> with an (existing) LSM.  The kernel already checks for mount's noexec
> when mapping a file, but not for the file permission, which is OK
> because it could be bypassed by coping the content of the file and
> mprotecting it anyway.  For a consistent memory execution control, all
> memory mapping need to be restricted, which is out of scope for this
> patch series.
>
Ok.

> > A detailed user case will help demonstrate the use case for dynamic
> > linker, e.g. what kind of app will benefit from
> > SECBIT_EXEC_RESTRICT_FILE = 1, what kind of threat model are we
> > dealing with , what kind of attack chain we blocked as a result.
>
> I explained that in the patches and in the description of these new
> securebits.  Please point which part is not clear.  The full threat
> model is simple: the TCB includes the kernel and system's files, which
> are integrity-protected, but we don't trust arbitrary data/scripts that
> can be written to user-owned files or directly provided to script
> interpreters.  As for the ptrace restrictions, the dynamic linker
> restrictions helps to avoid trivial bypasses (e.g. with LD_PRELOAD)
> with consistent executability checks.
>
On elf loading case, I'm clear after your last email. However, I'm not
sure if everyone else follows,  I will try to summarize here:
- Problem:  ld.so /tmp/a.out will happily pass, even /tmp/a.out is
mounted as non-exec.
  Solution: ld.so call execveat(AT_CHECK) for a.out before mmap a.out
into memory.

- Problem: a poorly built application (a.out) can have a dependency on
/tmp/a.o, when /tmp/a.o is on non-exec mount,
  Solution: ld.so call execveat(AT_CHECK) for a.o, before mmap a.o into memory.

- Problem: application can call mmap (/tmp/a.out, rx), where /tmp is
on non-exec mount
  This is out of scope, i.e. will require enforcement on mmap(), maybe
through LSM

Thanks
Best regards
-Jeff

-Jeff


> >
> > > >
> > > > Thanks
> > > > -Jeff
> > > >
> > > > > With the information that a script interpreter is about to interpret a
> > > > > script, an LSM security policy can adjust caller's access rights or log
> > > > > execution request as for native script execution (e.g. role transition).
> > > > > This is possible thanks to the call to security_bprm_creds_for_exec().
> > > > >
> > > > > Because LSMs may only change bprm's credentials, use of AT_CHECK with
> > > > > current kernel code should not be a security issue (e.g. unexpected role
> > > > > transition).  LSMs willing to update the caller's credential could now
> > > > > do so when bprm->is_check is set.  Of course, such policy change should
> > > > > be in line with the new user space code.
> > > > >
> > > > > Because AT_CHECK is dedicated to user space interpreters, it doesn't
> > > > > make sense for the kernel to parse the checked files, look for
> > > > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
> > > > > if the format is unknown.  Because of that, security_bprm_check() is
> > > > > never called when AT_CHECK is used.
> > > > >
> > > > > It should be noted that script interpreters cannot directly use
> > > > > execveat(2) (without this new AT_CHECK flag) because this could lead to
> > > > > unexpected behaviors e.g., `python script.sh` could lead to Bash being
> > > > > executed to interpret the script.  Unlike the kernel, script
> > > > > interpreters may just interpret the shebang as a simple comment, which
> > > > > should not change for backward compatibility reasons.
> > > > >
> > > > > Because scripts or libraries files might not currently have the
> > > > > executable permission set, or because we might want specific users to be
> > > > > allowed to run arbitrary scripts, the following patch provides a dynamic
> > > > > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and
> > > > > SECBIT_SHOULD_EXEC_RESTRICT securebits.
> > > > >
> > > > > This is a redesign of the CLIP OS 4's O_MAYEXEC:
> > > > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
> > > > > This patch has been used for more than a decade with customized script
> > > > > interpreters.  Some examples can be found here:
> > > > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC
> > > > >
> > > > > Cc: Al Viro <viro@zeniv.linux.org.uk>
> > > > > Cc: Christian Brauner <brauner@kernel.org>
> > > > > Cc: Kees Cook <keescook@chromium.org>
> > > > > Cc: Paul Moore <paul@paul-moore.com>
> > > > > Link: https://docs.python.org/3/library/io.html#io.open_code [1]
> > > > > Signed-off-by: Mickaël Salaün <mic@digikod.net>
> > > > > Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net
> > > > > ---
> > > > >
> > > > > New design since v18:
> > > > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net
> > > > > ---
> > > > >  fs/exec.c                  |  5 +++--
> > > > >  include/linux/binfmts.h    |  7 ++++++-
> > > > >  include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++
> > > > >  kernel/audit.h             |  1 +
> > > > >  kernel/auditsc.c           |  1 +
> > > > >  5 files changed, 41 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/fs/exec.c b/fs/exec.c
> > > > > index 40073142288f..ea2a1867afdc 100644
> > > > > --- a/fs/exec.c
> > > > > +++ b/fs/exec.c
> > > > > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags)
> > > > >                 .lookup_flags = LOOKUP_FOLLOW,
> > > > >         };
> > > > >
> > > > > -       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0)
> > > > > +       if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0)
> > > > >                 return ERR_PTR(-EINVAL);
> > > > >         if (flags & AT_SYMLINK_NOFOLLOW)
> > > > >                 open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW;
> > > > > @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl
> > > > >                 bprm->filename = bprm->fdpath;
> > > > >         }
> > > > >         bprm->interp = bprm->filename;
> > > > > +       bprm->is_check = !!(flags & AT_CHECK);
> > > > >
> > > > >         retval = bprm_mm_init(bprm);
> > > > >         if (!retval)
> > > > > @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm)
> > > > >
> > > > >         /* Set the unchanging part of bprm->cred */
> > > > >         retval = security_bprm_creds_for_exec(bprm);
> > > > > -       if (retval)
> > > > > +       if (retval || bprm->is_check)
> > > > >                 goto out;
> > > > >
> > > > >         retval = exec_binprm(bprm);
> > > > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> > > > > index 70f97f685bff..8ff9c9e33aed 100644
> > > > > --- a/include/linux/binfmts.h
> > > > > +++ b/include/linux/binfmts.h
> > > > > @@ -42,7 +42,12 @@ struct linux_binprm {
> > > > >                  * Set when errors can no longer be returned to the
> > > > >                  * original userspace.
> > > > >                  */
> > > > > -               point_of_no_return:1;
> > > > > +               point_of_no_return:1,
> > > > > +               /*
> > > > > +                * Set by user space to check executability according to the
> > > > > +                * caller's environment.
> > > > > +                */
> > > > > +               is_check:1;
> > > > >         struct file *executable; /* Executable to pass to the interpreter */
> > > > >         struct file *interpreter;
> > > > >         struct file *file;
> > > > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > > > > index c0bcc185fa48..bcd05c59b7df 100644
> > > > > --- a/include/uapi/linux/fcntl.h
> > > > > +++ b/include/uapi/linux/fcntl.h
> > > > > @@ -118,6 +118,36 @@
> > > > >  #define AT_HANDLE_FID          AT_REMOVEDIR    /* file handle is needed to
> > > > >                                         compare object identity and may not
> > > > >                                         be usable to open_by_handle_at(2) */
> > > > > +
> > > > > +/*
> > > > > + * AT_CHECK only performs a check on a regular file and returns 0 if execution
> > > > > + * of this file would be allowed, ignoring the file format and then the related
> > > > > + * interpreter dependencies (e.g. ELF libraries, script's shebang).  AT_CHECK
> > > > > + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling
> > > > > + * thread.  See securebits.h documentation.
> > > > > + *
> > > > > + * Programs should use this check to apply kernel-level checks against files
> > > > > + * that are not directly executed by the kernel but directly passed to a user
> > > > > + * space interpreter instead.  All files that contain executable code, from the
> > > > > + * point of view of the interpreter, should be checked.  The main purpose of
> > > > > + * this flag is to improve the security and consistency of an execution
> > > > > + * environment to ensure that direct file execution (e.g. ./script.sh) and
> > > > > + * indirect file execution (e.g. sh script.sh) lead to the same result.  For
> > > > > + * instance, this can be used to check if a file is trustworthy according to
> > > > > + * the caller's environment.
> > > > > + *
> > > > > + * In a secure environment, libraries and any executable dependencies should
> > > > > + * also be checked.  For instance dynamic linking should make sure that all
> > > > > + * libraries are allowed for execution to avoid trivial bypass (e.g. using
> > > > > + * LD_PRELOAD).  For such secure execution environment to make sense, only
> > > > > + * trusted code should be executable, which also requires integrity guarantees.
> > > > > + *
> > > > > + * To avoid race conditions leading to time-of-check to time-of-use issues,
> > > > > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file
> > > > > + * descriptor instead of a path.
> > > > > + */
> > > > > +#define AT_CHECK               0x10000
> > > > > +
> > > > >  #if defined(__KERNEL__)
> > > > >  #define AT_GETATTR_NOSEC       0x80000000
> > > > >  #endif
> > > > > diff --git a/kernel/audit.h b/kernel/audit.h
> > > > > index a60d2840559e..8ebdabd2ab81 100644
> > > > > --- a/kernel/audit.h
> > > > > +++ b/kernel/audit.h
> > > > > @@ -197,6 +197,7 @@ struct audit_context {
> > > > >                 struct open_how openat2;
> > > > >                 struct {
> > > > >                         int                     argc;
> > > > > +                       bool                    is_check;
> > > > >                 } execve;
> > > > >                 struct {
> > > > >                         char                    *name;
> > > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > > > index 6f0d6fb6523f..b6316e284342 100644
> > > > > --- a/kernel/auditsc.c
> > > > > +++ b/kernel/auditsc.c
> > > > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm)
> > > > >
> > > > >         context->type = AUDIT_EXECVE;
> > > > >         context->execve.argc = bprm->argc;
> > > > > +       context->execve.is_check = bprm->is_check;
> > > > >  }
> > > > >
> > > > >
> > > > > --
> > > > > 2.45.2
> > > > >
> > > >
> >

Mickaël Salaün July 19, 2024, 8:44 a.m. UTC | #37

On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote:
> On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > >
> > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > >
> > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > executable code, scripts could also use this check [1].
> > > > > >
> > > > > > This is different than faccessat(2) which only checks file access
> > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > real execution, user space gets the same error codes.
> > > > > >
> > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > exec, shared object, script and config file (such as seccomp config),

> > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > > > sure it passes AT_CHECK, before loading it into memory.
> > > >
> > > > All ELF dependencies are opened and checked with open_exec(), which
> > > > perform the main executability checks (with the __FMODE_EXEC flag).
> > > > Did I miss something?
> > > >
> > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> > > The app can choose its own dynamic linker path during build, (maybe
> > > even statically link one ?)  This is another reason that relying on a
> > > userspace only is not enough.
> >
> > The kernel calls open_exec() on all dependencies, including
> > ld-linux-x86-64.so.2, so these files are checked for executability too.
> >
> This might not be entirely true. iiuc, kernel  calls open_exec for
> open_exec for interpreter, but not all its dependency (e.g. libc.so.6)

Correct, the dynamic linker is in charge of that, which is why it must
be enlighten with execveat+AT_CHECK and securebits checks.

> load_elf_binary() {
>    interpreter = open_exec(elf_interpreter);
> }
> 
> libc.so.6 is opened and mapped by dynamic linker.
> so the call sequence is:
>  execve(a.out)
>   - open exec(a.out)
>   - security_bprm_creds(a.out)
>   - open the exec(ld.so)
>   - call open_exec() for interruptor (ld.so)
>   - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through
> the same check and code path as libc.so below ?

open_exec() checks are enough.  LSMs can use this information (open +
__FMODE_EXEC) if needed.  execveat+AT_CHECK is only a user space
request.

>   - transfer the control to ld.so)
>   - ld.so open (libc.so)
>   - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch,
> require dynamic linker change.
>   - ld.so mmap(libc.so,rx)

Explaining these steps is useful. I'll include that in the next patch
series.

> > > A detailed user case will help demonstrate the use case for dynamic
> > > linker, e.g. what kind of app will benefit from
> > > SECBIT_EXEC_RESTRICT_FILE = 1, what kind of threat model are we
> > > dealing with , what kind of attack chain we blocked as a result.
> >
> > I explained that in the patches and in the description of these new
> > securebits.  Please point which part is not clear.  The full threat
> > model is simple: the TCB includes the kernel and system's files, which
> > are integrity-protected, but we don't trust arbitrary data/scripts that
> > can be written to user-owned files or directly provided to script
> > interpreters.  As for the ptrace restrictions, the dynamic linker
> > restrictions helps to avoid trivial bypasses (e.g. with LD_PRELOAD)
> > with consistent executability checks.
> >
> On elf loading case, I'm clear after your last email. However, I'm not
> sure if everyone else follows,  I will try to summarize here:
> - Problem:  ld.so /tmp/a.out will happily pass, even /tmp/a.out is
> mounted as non-exec.
>   Solution: ld.so call execveat(AT_CHECK) for a.out before mmap a.out
> into memory.
> 
> - Problem: a poorly built application (a.out) can have a dependency on
> /tmp/a.o, when /tmp/a.o is on non-exec mount,
>   Solution: ld.so call execveat(AT_CHECK) for a.o, before mmap a.o into memory.
> 
> - Problem: application can call mmap (/tmp/a.out, rx), where /tmp is
> on non-exec mount

I'd say "malicious or non-enlightened processes" can call mmap without
execveat+AT_CHECK...

>   This is out of scope, i.e. will require enforcement on mmap(), maybe
> through LSM

Cool, I'll include that as well. Thanks.

Jeff Xu July 19, 2024, 2:16 p.m. UTC | #38

On Fri, Jul 19, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote:
> > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > >
> > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > >
> > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > > executable code, scripts could also use this check [1].
> > > > > > >
> > > > > > > This is different than faccessat(2) which only checks file access
> > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > > real execution, user space gets the same error codes.
> > > > > > >
> > > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > > exec, shared object, script and config file (such as seccomp config),
>
> > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > > > > sure it passes AT_CHECK, before loading it into memory.
> > > > >
> > > > > All ELF dependencies are opened and checked with open_exec(), which
> > > > > perform the main executability checks (with the __FMODE_EXEC flag).
> > > > > Did I miss something?
> > > > >
> > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> > > > The app can choose its own dynamic linker path during build, (maybe
> > > > even statically link one ?)  This is another reason that relying on a
> > > > userspace only is not enough.
> > >
> > > The kernel calls open_exec() on all dependencies, including
> > > ld-linux-x86-64.so.2, so these files are checked for executability too.
> > >
> > This might not be entirely true. iiuc, kernel  calls open_exec for
> > open_exec for interpreter, but not all its dependency (e.g. libc.so.6)
>
> Correct, the dynamic linker is in charge of that, which is why it must
> be enlighten with execveat+AT_CHECK and securebits checks.
>
> > load_elf_binary() {
> >    interpreter = open_exec(elf_interpreter);
> > }
> >
> > libc.so.6 is opened and mapped by dynamic linker.
> > so the call sequence is:
> >  execve(a.out)
> >   - open exec(a.out)
> >   - security_bprm_creds(a.out)
> >   - open the exec(ld.so)
> >   - call open_exec() for interruptor (ld.so)
> >   - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through
> > the same check and code path as libc.so below ?
>
> open_exec() checks are enough.  LSMs can use this information (open +
> __FMODE_EXEC) if needed.  execveat+AT_CHECK is only a user space
> request.
>
Then the ld.so doesn't go through the same security_bprm_creds() check
as other .so.

As my previous email, the ChromeOS LSM restricts executable mfd
through security_bprm_creds(), the end result is that ld.so can still
be executable memfd, but not other .so.

One way to address this is to refactor the necessary code from
execveat() code patch, and make it available to call from both kernel
and execveat() code paths., but if we do that, we might as well use
faccessat2(AT_CHECK)


> >   - transfer the control to ld.so)
> >   - ld.so open (libc.so)
> >   - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch,
> > require dynamic linker change.
> >   - ld.so mmap(libc.so,rx)
>
> Explaining these steps is useful. I'll include that in the next patch
> series.
>
> > > > A detailed user case will help demonstrate the use case for dynamic
> > > > linker, e.g. what kind of app will benefit from
> > > > SECBIT_EXEC_RESTRICT_FILE = 1, what kind of threat model are we
> > > > dealing with , what kind of attack chain we blocked as a result.
> > >
> > > I explained that in the patches and in the description of these new
> > > securebits.  Please point which part is not clear.  The full threat
> > > model is simple: the TCB includes the kernel and system's files, which
> > > are integrity-protected, but we don't trust arbitrary data/scripts that
> > > can be written to user-owned files or directly provided to script
> > > interpreters.  As for the ptrace restrictions, the dynamic linker
> > > restrictions helps to avoid trivial bypasses (e.g. with LD_PRELOAD)
> > > with consistent executability checks.
> > >
> > On elf loading case, I'm clear after your last email. However, I'm not
> > sure if everyone else follows,  I will try to summarize here:
> > - Problem:  ld.so /tmp/a.out will happily pass, even /tmp/a.out is
> > mounted as non-exec.
> >   Solution: ld.so call execveat(AT_CHECK) for a.out before mmap a.out
> > into memory.
> >
> > - Problem: a poorly built application (a.out) can have a dependency on
> > /tmp/a.o, when /tmp/a.o is on non-exec mount,
> >   Solution: ld.so call execveat(AT_CHECK) for a.o, before mmap a.o into memory.
> >
> > - Problem: application can call mmap (/tmp/a.out, rx), where /tmp is
> > on non-exec mount
>
> I'd say "malicious or non-enlightened processes" can call mmap without
> execveat+AT_CHECK...
>
> >   This is out of scope, i.e. will require enforcement on mmap(), maybe
> > through LSM
>
> Cool, I'll include that as well. Thanks.

Mickaël Salaün July 19, 2024, 3:04 p.m. UTC | #39

On Fri, Jul 19, 2024 at 07:16:55AM -0700, Jeff Xu wrote:
> On Fri, Jul 19, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote:
> > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > >
> > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > >
> > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > >
> > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > > > executable code, scripts could also use this check [1].
> > > > > > > >
> > > > > > > > This is different than faccessat(2) which only checks file access
> > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > > > real execution, user space gets the same error codes.
> > > > > > > >
> > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > > > exec, shared object, script and config file (such as seccomp config),
> >
> > > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > > > > > sure it passes AT_CHECK, before loading it into memory.
> > > > > >
> > > > > > All ELF dependencies are opened and checked with open_exec(), which
> > > > > > perform the main executability checks (with the __FMODE_EXEC flag).
> > > > > > Did I miss something?
> > > > > >
> > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> > > > > The app can choose its own dynamic linker path during build, (maybe
> > > > > even statically link one ?)  This is another reason that relying on a
> > > > > userspace only is not enough.
> > > >
> > > > The kernel calls open_exec() on all dependencies, including
> > > > ld-linux-x86-64.so.2, so these files are checked for executability too.
> > > >
> > > This might not be entirely true. iiuc, kernel  calls open_exec for
> > > open_exec for interpreter, but not all its dependency (e.g. libc.so.6)
> >
> > Correct, the dynamic linker is in charge of that, which is why it must
> > be enlighten with execveat+AT_CHECK and securebits checks.
> >
> > > load_elf_binary() {
> > >    interpreter = open_exec(elf_interpreter);
> > > }
> > >
> > > libc.so.6 is opened and mapped by dynamic linker.
> > > so the call sequence is:
> > >  execve(a.out)
> > >   - open exec(a.out)
> > >   - security_bprm_creds(a.out)
> > >   - open the exec(ld.so)
> > >   - call open_exec() for interruptor (ld.so)
> > >   - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through
> > > the same check and code path as libc.so below ?
> >
> > open_exec() checks are enough.  LSMs can use this information (open +
> > __FMODE_EXEC) if needed.  execveat+AT_CHECK is only a user space
> > request.
> >
> Then the ld.so doesn't go through the same security_bprm_creds() check
> as other .so.

Indeed, but...

> 
> As my previous email, the ChromeOS LSM restricts executable mfd
> through security_bprm_creds(), the end result is that ld.so can still
> be executable memfd, but not other .so.

The chromeOS LSM can check that with the security_file_open() hook and
the __FMODE_EXEC flag, see Landlock's implementation.  I think this
should be the only hook implementation that chromeOS LSM needs to add.

> 
> One way to address this is to refactor the necessary code from
> execveat() code patch, and make it available to call from both kernel
> and execveat() code paths., but if we do that, we might as well use
> faccessat2(AT_CHECK)

That's why I think it makes sense to rely on the existing __FMODE_EXEC
information.

> 
> 
> > >   - transfer the control to ld.so)
> > >   - ld.so open (libc.so)
> > >   - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch,
> > > require dynamic linker change.
> > >   - ld.so mmap(libc.so,rx)
> >
> > Explaining these steps is useful. I'll include that in the next patch
> > series.

Jeff Xu July 19, 2024, 3:12 p.m. UTC | #40

On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > >
> > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > executable code, scripts could also use this check [1].
> > > > >
> > > > > This is different than faccessat(2) which only checks file access
> > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > real execution, user space gets the same error codes.
> > > > >
> > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > exec, shared object, script and config file (such as seccomp config),
> > >
> > > "config file" that contains executable code.
> > >
> > Is seccomp config  considered as "contains executable code", seccomp
> > config is translated into bpf, so maybe yes ? but bpf is running in
> > the kernel.
>
> Because seccomp filters alter syscalls, they are similar to code
> injection.
>
> >
> > > > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
> > > > different use cases:
> > > >
> > > > execveat clearly has less code change, but that also means: we can't
> > > > add logic specific to exec (i.e. logic that can't be applied to
> > > > config) for this part (from do_execveat_common to
> > > > security_bprm_creds_for_exec) in future.  This would require some
> > > > agreement/sign-off, I'm not sure from whom.
> > >
> > > I'm not sure to follow. We could still add new flags, but for now I
> > > don't see use cases.  This patch series is not meant to handle all
> > > possible "trust checks", only executable code, which makes sense for the
> > > kernel.
> > >
> > I guess the "configfile" discussion is where I get confused, at one
> > point, I think this would become a generic "trust checks" api for
> > everything related to "generating executable code", e.g. javascript,
> > java code, and more.
> > We will want to clearly define the scope of execveat(AT_CHECK)
>
> The line between data and code is blurry.  For instance, a configuration
> file can impact the execution flow of a program.  So, where to draw the
> line?
>
> It might makes sense to follow the kernel and interpreter semantic: if a
> file can be executed by the kernel (e.g. ELF binary, file containing a
> shebang, or just configured with binfmt_misc), then this should be
> considered as executable code.  This applies to Bash, Python,
> Javascript, NodeJS, PE, PHP...  However, we can also make a picture
> executable with binfmt_misc.  So, again, where to draw the line?
>
> I'd recommend to think about interaction with the outside, through
> function calls, IPCs, syscalls...  For instance, "running" an image
> should not lead to reading or writing to arbitrary files, or accessing
> the network, but in practice it is legitimate for some file formats...
> PostScript is a programming language, but mostly used to draw pictures.
> So, again, where to draw the line?
>
The javascript is run by browser and java code by java runtime, do
they meet the criteria? they do not interact with the kernel directly,
however they might have the same "executable" characteristics and the
app might not want them to be put into non-exec mount.

If the answer is yes, they can also use execveat(AT_CHECK),  the next
question is: does it make sense for javacript/java code to go through
execveat() code path, allocate bprm, etc ? (I don't have answer, maybe
it is)

> We should follow the principle of least astonishment.  What most users
> would expect?  This should follow the *common usage* of executable
> files.  At the end, the script interpreters will be patched by security
> folks for security reasons.  I think the right question to ask should
> be: could this file format be (ab)used to leak or modify arbitrary
> files, or to perform arbitrary syscalls?  If the answer is yes, then it
> should be checked for executability.  Of course, this excludes bugs
> exploited in the file format parser.
>
> I'll extend the next patch series with this rationale.
>

Jeff Xu July 19, 2024, 3:27 p.m. UTC | #41

On Fri, Jul 19, 2024 at 8:04 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Fri, Jul 19, 2024 at 07:16:55AM -0700, Jeff Xu wrote:
> > On Fri, Jul 19, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote:
> > > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > >
> > > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > >
> > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > > >
> > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > > > > executable code, scripts could also use this check [1].
> > > > > > > > >
> > > > > > > > > This is different than faccessat(2) which only checks file access
> > > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > > > > real execution, user space gets the same error codes.
> > > > > > > > >
> > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > > > > exec, shared object, script and config file (such as seccomp config),
> > >
> > > > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > > > > > > sure it passes AT_CHECK, before loading it into memory.
> > > > > > >
> > > > > > > All ELF dependencies are opened and checked with open_exec(), which
> > > > > > > perform the main executability checks (with the __FMODE_EXEC flag).
> > > > > > > Did I miss something?
> > > > > > >
> > > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> > > > > > The app can choose its own dynamic linker path during build, (maybe
> > > > > > even statically link one ?)  This is another reason that relying on a
> > > > > > userspace only is not enough.
> > > > >
> > > > > The kernel calls open_exec() on all dependencies, including
> > > > > ld-linux-x86-64.so.2, so these files are checked for executability too.
> > > > >
> > > > This might not be entirely true. iiuc, kernel  calls open_exec for
> > > > open_exec for interpreter, but not all its dependency (e.g. libc.so.6)
> > >
> > > Correct, the dynamic linker is in charge of that, which is why it must
> > > be enlighten with execveat+AT_CHECK and securebits checks.
> > >
> > > > load_elf_binary() {
> > > >    interpreter = open_exec(elf_interpreter);
> > > > }
> > > >
> > > > libc.so.6 is opened and mapped by dynamic linker.
> > > > so the call sequence is:
> > > >  execve(a.out)
> > > >   - open exec(a.out)
> > > >   - security_bprm_creds(a.out)
> > > >   - open the exec(ld.so)
> > > >   - call open_exec() for interruptor (ld.so)
> > > >   - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through
> > > > the same check and code path as libc.so below ?
> > >
> > > open_exec() checks are enough.  LSMs can use this information (open +
> > > __FMODE_EXEC) if needed.  execveat+AT_CHECK is only a user space
> > > request.
> > >
> > Then the ld.so doesn't go through the same security_bprm_creds() check
> > as other .so.
>
> Indeed, but...
>
My point is: we will want all the .so going through the same code
path, so  security_ functions are called consistently across all the
objects, And in the future, if we want to develop additional LSM
functionality based on AT_CHECK, it will be applied to all objects.

Another thing to consider is:  we are asking userspace to make
additional syscall before  loading the file into memory/get executed,
there is a possibility for future expansion of the mechanism, without
asking user space to add another syscall again.

I m still not convinced yet that execveat(AT_CHECK) fits more than
faccessat(AT_CHECK)


> >
> > As my previous email, the ChromeOS LSM restricts executable mfd
> > through security_bprm_creds(), the end result is that ld.so can still
> > be executable memfd, but not other .so.
>
> The chromeOS LSM can check that with the security_file_open() hook and
> the __FMODE_EXEC flag, see Landlock's implementation.  I think this
> should be the only hook implementation that chromeOS LSM needs to add.
>
> >
> > One way to address this is to refactor the necessary code from
> > execveat() code patch, and make it available to call from both kernel
> > and execveat() code paths., but if we do that, we might as well use
> > faccessat2(AT_CHECK)
>
> That's why I think it makes sense to rely on the existing __FMODE_EXEC
> information.
>
> >
> >
> > > >   - transfer the control to ld.so)
> > > >   - ld.so open (libc.so)
> > > >   - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch,
> > > > require dynamic linker change.
> > > >   - ld.so mmap(libc.so,rx)
> > >
> > > Explaining these steps is useful. I'll include that in the next patch
> > > series.

Mickaël Salaün July 19, 2024, 3:31 p.m. UTC | #42

On Fri, Jul 19, 2024 at 08:12:37AM -0700, Jeff Xu wrote:
> On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > >
> > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > >
> > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > executable code, scripts could also use this check [1].
> > > > > >
> > > > > > This is different than faccessat(2) which only checks file access
> > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > real execution, user space gets the same error codes.
> > > > > >
> > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > exec, shared object, script and config file (such as seccomp config),
> > > >
> > > > "config file" that contains executable code.
> > > >
> > > Is seccomp config  considered as "contains executable code", seccomp
> > > config is translated into bpf, so maybe yes ? but bpf is running in
> > > the kernel.
> >
> > Because seccomp filters alter syscalls, they are similar to code
> > injection.
> >
> > >
> > > > > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
> > > > > different use cases:
> > > > >
> > > > > execveat clearly has less code change, but that also means: we can't
> > > > > add logic specific to exec (i.e. logic that can't be applied to
> > > > > config) for this part (from do_execveat_common to
> > > > > security_bprm_creds_for_exec) in future.  This would require some
> > > > > agreement/sign-off, I'm not sure from whom.
> > > >
> > > > I'm not sure to follow. We could still add new flags, but for now I
> > > > don't see use cases.  This patch series is not meant to handle all
> > > > possible "trust checks", only executable code, which makes sense for the
> > > > kernel.
> > > >
> > > I guess the "configfile" discussion is where I get confused, at one
> > > point, I think this would become a generic "trust checks" api for
> > > everything related to "generating executable code", e.g. javascript,
> > > java code, and more.
> > > We will want to clearly define the scope of execveat(AT_CHECK)
> >
> > The line between data and code is blurry.  For instance, a configuration
> > file can impact the execution flow of a program.  So, where to draw the
> > line?
> >
> > It might makes sense to follow the kernel and interpreter semantic: if a
> > file can be executed by the kernel (e.g. ELF binary, file containing a
> > shebang, or just configured with binfmt_misc), then this should be
> > considered as executable code.  This applies to Bash, Python,
> > Javascript, NodeJS, PE, PHP...  However, we can also make a picture
> > executable with binfmt_misc.  So, again, where to draw the line?
> >
> > I'd recommend to think about interaction with the outside, through
> > function calls, IPCs, syscalls...  For instance, "running" an image
> > should not lead to reading or writing to arbitrary files, or accessing
> > the network, but in practice it is legitimate for some file formats...
> > PostScript is a programming language, but mostly used to draw pictures.
> > So, again, where to draw the line?
> >
> The javascript is run by browser and java code by java runtime, do
> they meet the criteria? they do not interact with the kernel directly,
> however they might have the same "executable" characteristics and the
> app might not want them to be put into non-exec mount.
> 
> If the answer is yes, they can also use execveat(AT_CHECK),  the next
> question is: does it make sense for javacript/java code to go through
> execveat() code path, allocate bprm, etc ? (I don't have answer, maybe
> it is)

Java and NodeJS can do arbitrary syscalls (through their runtime) and
they can access arbitrary files, so according to my below comment, yes
they should be managed as potentially dangerous executable code.

The question should be: is this code trusted? Most of the time it is
not, hence the security model of web browser and their heavy use of
sandboxing.  So no, I don't think it would make sense to check this kind
of code more than what the browser already do.

I'll talk about this use case in the next patch series.

> 
> > We should follow the principle of least astonishment.  What most users
> > would expect?  This should follow the *common usage* of executable
> > files.  At the end, the script interpreters will be patched by security
> > folks for security reasons.  I think the right question to ask should
> > be: could this file format be (ab)used to leak or modify arbitrary
> > files, or to perform arbitrary syscalls?  If the answer is yes, then it
> > should be checked for executability.  Of course, this excludes bugs
> > exploited in the file format parser.
> >
> > I'll extend the next patch series with this rationale.
> >
>

Jeff Xu July 19, 2024, 5:36 p.m. UTC | #43

On Fri, Jul 19, 2024 at 8:31 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Fri, Jul 19, 2024 at 08:12:37AM -0700, Jeff Xu wrote:
> > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > >
> > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > >
> > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > > executable code, scripts could also use this check [1].
> > > > > > >
> > > > > > > This is different than faccessat(2) which only checks file access
> > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > > real execution, user space gets the same error codes.
> > > > > > >
> > > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > > exec, shared object, script and config file (such as seccomp config),
> > > > >
> > > > > "config file" that contains executable code.
> > > > >
> > > > Is seccomp config  considered as "contains executable code", seccomp
> > > > config is translated into bpf, so maybe yes ? but bpf is running in
> > > > the kernel.
> > >
> > > Because seccomp filters alter syscalls, they are similar to code
> > > injection.
> > >
> > > >
> > > > > > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
> > > > > > different use cases:
> > > > > >
> > > > > > execveat clearly has less code change, but that also means: we can't
> > > > > > add logic specific to exec (i.e. logic that can't be applied to
> > > > > > config) for this part (from do_execveat_common to
> > > > > > security_bprm_creds_for_exec) in future.  This would require some
> > > > > > agreement/sign-off, I'm not sure from whom.
> > > > >
> > > > > I'm not sure to follow. We could still add new flags, but for now I
> > > > > don't see use cases.  This patch series is not meant to handle all
> > > > > possible "trust checks", only executable code, which makes sense for the
> > > > > kernel.
> > > > >
> > > > I guess the "configfile" discussion is where I get confused, at one
> > > > point, I think this would become a generic "trust checks" api for
> > > > everything related to "generating executable code", e.g. javascript,
> > > > java code, and more.
> > > > We will want to clearly define the scope of execveat(AT_CHECK)
> > >
> > > The line between data and code is blurry.  For instance, a configuration
> > > file can impact the execution flow of a program.  So, where to draw the
> > > line?
> > >
> > > It might makes sense to follow the kernel and interpreter semantic: if a
> > > file can be executed by the kernel (e.g. ELF binary, file containing a
> > > shebang, or just configured with binfmt_misc), then this should be
> > > considered as executable code.  This applies to Bash, Python,
> > > Javascript, NodeJS, PE, PHP...  However, we can also make a picture
> > > executable with binfmt_misc.  So, again, where to draw the line?
> > >
> > > I'd recommend to think about interaction with the outside, through
> > > function calls, IPCs, syscalls...  For instance, "running" an image
> > > should not lead to reading or writing to arbitrary files, or accessing
> > > the network, but in practice it is legitimate for some file formats...
> > > PostScript is a programming language, but mostly used to draw pictures.
> > > So, again, where to draw the line?
> > >
> > The javascript is run by browser and java code by java runtime, do
> > they meet the criteria? they do not interact with the kernel directly,
> > however they might have the same "executable" characteristics and the
> > app might not want them to be put into non-exec mount.
> >
> > If the answer is yes, they can also use execveat(AT_CHECK),  the next
> > question is: does it make sense for javacript/java code to go through
> > execveat() code path, allocate bprm, etc ? (I don't have answer, maybe
> > it is)
>
> Java and NodeJS can do arbitrary syscalls (through their runtime) and
> they can access arbitrary files, so according to my below comment, yes
> they should be managed as potentially dangerous executable code.
>
> The question should be: is this code trusted? Most of the time it is
> not, hence the security model of web browser and their heavy use of
> sandboxing.  So no, I don't think it would make sense to check this kind
> of code more than what the browser already do.
>

If I understand you correctly, Java/NodeJS won't use
execveat(AT_CHECK), we will leave that work to the web browser/java
runtime's sandboxer.
This is good because the scope is more narrow/clear.

Thanks
-Jeff

> I'll talk about this use case in the next patch series.
>
> >
> > > We should follow the principle of least astonishment.  What most users
> > > would expect?  This should follow the *common usage* of executable
> > > files.  At the end, the script interpreters will be patched by security
> > > folks for security reasons.  I think the right question to ask should
> > > be: could this file format be (ab)used to leak or modify arbitrary
> > > files, or to perform arbitrary syscalls?  If the answer is yes, then it
> > > should be checked for executability.  Of course, this excludes bugs
> > > exploited in the file format parser.
> > >
> > > I'll extend the next patch series with this rationale.
> > >
> >

Andy Lutomirski July 20, 2024, 1:59 a.m. UTC | #44

> On Jul 18, 2024, at 8:22 PM, Mickaël Salaün <mic@digikod.net> wrote:
>
> On Thu, Jul 18, 2024 at 09:02:56AM +0800, Andy Lutomirski wrote:
>>>> On Jul 17, 2024, at 6:01 PM, Mickaël Salaün <mic@digikod.net> wrote:
>>>
>>> On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote:
>>>>> On 17/07/2024 07:33, Jeff Xu wrote:
>>>>> Consider those cases: I think:
>>>>> a> relying purely on userspace for enforcement does't seem to be
>>>>> effective,  e.g. it is trivial  to call open(), then mmap() it into
>>>>> executable memory.
>>>>
>>>> If there's a way to do this without running executable code that had to pass
>>>> a previous execveat() check, then yeah, it's not effective (e.g. a Python
>>>> interpreter that *doesn't* enforce execveat() is a trivial way to do it).
>>>>
>>>> Once arbitrary code is running, all bets are off. So long as all arbitrary
>>>> code is being checked itself, it's allowed to do things that would bypass
>>>> later checks (and it's up to whoever audited it in the first place to
>>>> prevent this by not giving it the special mark that allows it to pass the
>>>> check).
>>>
>>> Exactly.  As explained in the patches, one crucial prerequisite is that
>>> the executable code is trusted, and the system must provide integrity
>>> guarantees.  We cannot do anything without that.  This patches series is
>>> a building block to fix a blind spot on Linux systems to be able to
>>> fully control executability.
>>
>> Circling back to my previous comment (did that ever get noticed?), I
>
> Yes, I replied to your comments.  Did I miss something?

I missed that email in the pile, sorry. I’ll reply separately.

>
>> don’t think this is quite right:
>>
>> https://lore.kernel.org/all/CALCETrWYu=PYJSgyJ-vaa+3BGAry8Jo8xErZLiGR3U5h6+U0tA@mail.gmail.com/
>>
>> On a basic system configuration, a given path either may or may not be
>> executed. And maybe that path has some integrity check (dm-verity,
>> etc).  So the kernel should tell the interpreter/loader whether the
>> target may be executed. All fine.
>>
>> But I think the more complex cases are more interesting, and the
>> “execute a program” process IS NOT BINARY.  An attempt to execute can
>> be rejected outright, or it can be allowed *with a change to creds or
>> security context*.  It would be entirely reasonable to have a policy
>> that allows execution of non-integrity-checked files but in a very
>> locked down context only.
>
> I guess you mean to transition to a sandbox when executing an untrusted
> file.  This is a good idea.  I talked about role transition in the
> patch's description:
>
> With the information that a script interpreter is about to interpret a
> script, an LSM security policy can adjust caller's access rights or log
> execution request as for native script execution (e.g. role transition).
> This is possible thanks to the call to security_bprm_creds_for_exec().

…

> This patch series brings the minimal building blocks to have a
> consistent execution environment.  Role transitions for script execution
> are left to LSMs.  For instance, we could extend Landlock to
> automatically sandbox untrusted scripts.

I’m not really convinced.  There’s more to building an API that
enables LSM hooks than merely sticking the hook somewhere in kernel
code. It needs to be a defined API. If you call an operation “check”,
then people will expect it to check, not to change the caller’s
credentials.  And people will mess it up in both directions (e.g.
callers will call it and then open try to load some library that they
should have loaded first, or callers will call it and forget to close
fds first.

And there should probably be some interaction with dumpable as well.
If I “check” a file for executability, that should not suddenly allow
someone to ptrace me?

And callers need to know to exit on failure, not carry on.


More concretely, a runtime that fully opts in to this may well "check"
multiple things.  For example, if I do:

$ ld.so ~/.local/bin/some_program   (i.e. I literally execve ld.so)

then ld.so will load several things:

~/.local/bin/some_program
libc.so
other random DSOs, some of which may well be in my home directory

And for all ld.so knows, some_program is actually an interpreter and
will "check" something else.  And the LSMs have absolutely no clue
what's what.  So I think for this to work right, the APIs need to be a
lot more expressive and explicit:

check_library(fd to libc.so);  <-- does not transition or otherwise drop privs
check_transition_main_program(fd to ~/.local/bin/some_program);  <--
may drop privs

and if some_program is really an interpreter, then it will do:

check_library(fd to some thing imported by the script);
check_transition_main_program(fd to the actual script);

And maybe that takes a parameter that gets run eval-style:

check_unsafe_user_script("actual contents of snippet");

The actual spelling of all this doesn't matter so much.  But the user
code and the kernel code need to be on the same page as to what the
user program is doing and what it's asking the kernel program to do.

--Andy

Jarkko Sakkinen July 20, 2024, 11:43 a.m. UTC | #45

On Sat Jul 20, 2024 at 4:59 AM EEST, Andy Lutomirski wrote:
> > On Jul 18, 2024, at 8:22 PM, Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Thu, Jul 18, 2024 at 09:02:56AM +0800, Andy Lutomirski wrote:
> >>>> On Jul 17, 2024, at 6:01 PM, Mickaël Salaün <mic@digikod.net> wrote:
> >>>
> >>> On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote:
> >>>>> On 17/07/2024 07:33, Jeff Xu wrote:
> >>>>> Consider those cases: I think:
> >>>>> a> relying purely on userspace for enforcement does't seem to be
> >>>>> effective,  e.g. it is trivial  to call open(), then mmap() it into
> >>>>> executable memory.
> >>>>
> >>>> If there's a way to do this without running executable code that had to pass
> >>>> a previous execveat() check, then yeah, it's not effective (e.g. a Python
> >>>> interpreter that *doesn't* enforce execveat() is a trivial way to do it).
> >>>>
> >>>> Once arbitrary code is running, all bets are off. So long as all arbitrary
> >>>> code is being checked itself, it's allowed to do things that would bypass
> >>>> later checks (and it's up to whoever audited it in the first place to
> >>>> prevent this by not giving it the special mark that allows it to pass the
> >>>> check).
> >>>
> >>> Exactly.  As explained in the patches, one crucial prerequisite is that
> >>> the executable code is trusted, and the system must provide integrity
> >>> guarantees.  We cannot do anything without that.  This patches series is
> >>> a building block to fix a blind spot on Linux systems to be able to
> >>> fully control executability.
> >>
> >> Circling back to my previous comment (did that ever get noticed?), I
> >
> > Yes, I replied to your comments.  Did I miss something?
>
> I missed that email in the pile, sorry. I’ll reply separately.
>
> >
> >> don’t think this is quite right:
> >>
> >> https://lore.kernel.org/all/CALCETrWYu=PYJSgyJ-vaa+3BGAry8Jo8xErZLiGR3U5h6+U0tA@mail.gmail.com/
> >>
> >> On a basic system configuration, a given path either may or may not be
> >> executed. And maybe that path has some integrity check (dm-verity,
> >> etc).  So the kernel should tell the interpreter/loader whether the
> >> target may be executed. All fine.
> >>
> >> But I think the more complex cases are more interesting, and the
> >> “execute a program” process IS NOT BINARY.  An attempt to execute can
> >> be rejected outright, or it can be allowed *with a change to creds or
> >> security context*.  It would be entirely reasonable to have a policy
> >> that allows execution of non-integrity-checked files but in a very
> >> locked down context only.
> >
> > I guess you mean to transition to a sandbox when executing an untrusted
> > file.  This is a good idea.  I talked about role transition in the
> > patch's description:
> >
> > With the information that a script interpreter is about to interpret a
> > script, an LSM security policy can adjust caller's access rights or log
> > execution request as for native script execution (e.g. role transition).
> > This is possible thanks to the call to security_bprm_creds_for_exec().
>
> …
>
> > This patch series brings the minimal building blocks to have a
> > consistent execution environment.  Role transitions for script execution
> > are left to LSMs.  For instance, we could extend Landlock to
> > automatically sandbox untrusted scripts.
>
> I’m not really convinced.  There’s more to building an API that
> enables LSM hooks than merely sticking the hook somewhere in kernel
> code. It needs to be a defined API. If you call an operation “check”,
> then people will expect it to check, not to change the caller’s
> credentials.  And people will mess it up in both directions (e.g.
> callers will call it and then open try to load some library that they
> should have loaded first, or callers will call it and forget to close
> fds first.
>
> And there should probably be some interaction with dumpable as well.
> If I “check” a file for executability, that should not suddenly allow
> someone to ptrace me?
>
> And callers need to know to exit on failure, not carry on.
>
>
> More concretely, a runtime that fully opts in to this may well "check"
> multiple things.  For example, if I do:
>
> $ ld.so ~/.local/bin/some_program   (i.e. I literally execve ld.so)
>
> then ld.so will load several things:
>
> ~/.local/bin/some_program
> libc.so
> other random DSOs, some of which may well be in my home directory

What would really help to comprehend this patch set would be a set of
test scripts, preferably something that you can run easily with
BuildRoot or similar.

Scripts would demonstrate the use cases for the patch set. Then it
would be easier to develop scripts that would underline the corner
cases. I would keep all this out of kselftest shenanigans for now.

I feel that the patch set is hovering in abstractions with examples
that you cannot execute.

I added the patches to standard test CI hack:

https://codeberg.org/jarkko/linux-tpmdd-test

But after I booted up a kernel I had no idea what to do with it. And
all this lenghty discussion makes it even more confusing.

Please find some connection to the real world before sending any new
version of this (e.g. via test scripts). I think this should not be
pulled before almost anyone doing kernel dev can comprehend the "gist"
at least in some reasonable level.

BR, Jarkko

Mickaël Salaün July 23, 2024, 1:15 p.m. UTC | #46

On Fri, Jul 19, 2024 at 08:27:18AM -0700, Jeff Xu wrote:
> On Fri, Jul 19, 2024 at 8:04 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Fri, Jul 19, 2024 at 07:16:55AM -0700, Jeff Xu wrote:
> > > On Fri, Jul 19, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > >
> > > > On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote:
> > > > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > >
> > > > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > >
> > > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > > > >
> > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > > > > > executable code, scripts could also use this check [1].
> > > > > > > > > >
> > > > > > > > > > This is different than faccessat(2) which only checks file access
> > > > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > > > > > real execution, user space gets the same error codes.
> > > > > > > > > >
> > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > > > > > exec, shared object, script and config file (such as seccomp config),
> > > >
> > > > > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > > > > > > > sure it passes AT_CHECK, before loading it into memory.
> > > > > > > >
> > > > > > > > All ELF dependencies are opened and checked with open_exec(), which
> > > > > > > > perform the main executability checks (with the __FMODE_EXEC flag).
> > > > > > > > Did I miss something?
> > > > > > > >
> > > > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> > > > > > > The app can choose its own dynamic linker path during build, (maybe
> > > > > > > even statically link one ?)  This is another reason that relying on a
> > > > > > > userspace only is not enough.
> > > > > >
> > > > > > The kernel calls open_exec() on all dependencies, including
> > > > > > ld-linux-x86-64.so.2, so these files are checked for executability too.
> > > > > >
> > > > > This might not be entirely true. iiuc, kernel  calls open_exec for
> > > > > open_exec for interpreter, but not all its dependency (e.g. libc.so.6)
> > > >
> > > > Correct, the dynamic linker is in charge of that, which is why it must
> > > > be enlighten with execveat+AT_CHECK and securebits checks.
> > > >
> > > > > load_elf_binary() {
> > > > >    interpreter = open_exec(elf_interpreter);
> > > > > }
> > > > >
> > > > > libc.so.6 is opened and mapped by dynamic linker.
> > > > > so the call sequence is:
> > > > >  execve(a.out)
> > > > >   - open exec(a.out)
> > > > >   - security_bprm_creds(a.out)
> > > > >   - open the exec(ld.so)
> > > > >   - call open_exec() for interruptor (ld.so)
> > > > >   - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through
> > > > > the same check and code path as libc.so below ?
> > > >
> > > > open_exec() checks are enough.  LSMs can use this information (open +
> > > > __FMODE_EXEC) if needed.  execveat+AT_CHECK is only a user space
> > > > request.
> > > >
> > > Then the ld.so doesn't go through the same security_bprm_creds() check
> > > as other .so.
> >
> > Indeed, but...
> >
> My point is: we will want all the .so going through the same code
> path, so  security_ functions are called consistently across all the
> objects, And in the future, if we want to develop additional LSM
> functionality based on AT_CHECK, it will be applied to all objects.

I'll extend the doc to encourage LSMs to check for __FMODE_EXEC, which
already is the common security check for all executable dependencies.
As extra information, they can get explicit requests by looking at
execveat+AT_CHECK call.

> 
> Another thing to consider is:  we are asking userspace to make
> additional syscall before  loading the file into memory/get executed,
> there is a possibility for future expansion of the mechanism, without
> asking user space to add another syscall again.

AT_CHECK is defined with a specific semantic.  Other mechanisms (e.g.
LSM policies) could enforce other restrictions following the same
semantic.  We need to keep in mind backward compatibility.

> 
> I m still not convinced yet that execveat(AT_CHECK) fits more than
> faccessat(AT_CHECK)

faccessat2(2) is dedicated to file permission/attribute check.
execveat(2) is dedicated to execution, which is a superset of file
permission for executability, plus other checks (e.g. noexec).

> 
> 
> > >
> > > As my previous email, the ChromeOS LSM restricts executable mfd
> > > through security_bprm_creds(), the end result is that ld.so can still
> > > be executable memfd, but not other .so.
> >
> > The chromeOS LSM can check that with the security_file_open() hook and
> > the __FMODE_EXEC flag, see Landlock's implementation.  I think this
> > should be the only hook implementation that chromeOS LSM needs to add.
> >
> > >
> > > One way to address this is to refactor the necessary code from
> > > execveat() code patch, and make it available to call from both kernel
> > > and execveat() code paths., but if we do that, we might as well use
> > > faccessat2(AT_CHECK)
> >
> > That's why I think it makes sense to rely on the existing __FMODE_EXEC
> > information.
> >
> > >
> > >
> > > > >   - transfer the control to ld.so)
> > > > >   - ld.so open (libc.so)
> > > > >   - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch,
> > > > > require dynamic linker change.
> > > > >   - ld.so mmap(libc.so,rx)
> > > >
> > > > Explaining these steps is useful. I'll include that in the next patch
> > > > series.
>

Mickaël Salaün July 23, 2024, 1:15 p.m. UTC | #47

On Fri, Jul 19, 2024 at 10:36:01AM -0700, Jeff Xu wrote:
> On Fri, Jul 19, 2024 at 8:31 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Fri, Jul 19, 2024 at 08:12:37AM -0700, Jeff Xu wrote:
> > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > >
> > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > >
> > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > >
> > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > > > executable code, scripts could also use this check [1].
> > > > > > > >
> > > > > > > > This is different than faccessat(2) which only checks file access
> > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > > > real execution, user space gets the same error codes.
> > > > > > > >
> > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > > > exec, shared object, script and config file (such as seccomp config),
> > > > > >
> > > > > > "config file" that contains executable code.
> > > > > >
> > > > > Is seccomp config  considered as "contains executable code", seccomp
> > > > > config is translated into bpf, so maybe yes ? but bpf is running in
> > > > > the kernel.
> > > >
> > > > Because seccomp filters alter syscalls, they are similar to code
> > > > injection.
> > > >
> > > > >
> > > > > > > I'm still thinking  execveat(AT_CHECK) vs faccessat(AT_CHECK) in
> > > > > > > different use cases:
> > > > > > >
> > > > > > > execveat clearly has less code change, but that also means: we can't
> > > > > > > add logic specific to exec (i.e. logic that can't be applied to
> > > > > > > config) for this part (from do_execveat_common to
> > > > > > > security_bprm_creds_for_exec) in future.  This would require some
> > > > > > > agreement/sign-off, I'm not sure from whom.
> > > > > >
> > > > > > I'm not sure to follow. We could still add new flags, but for now I
> > > > > > don't see use cases.  This patch series is not meant to handle all
> > > > > > possible "trust checks", only executable code, which makes sense for the
> > > > > > kernel.
> > > > > >
> > > > > I guess the "configfile" discussion is where I get confused, at one
> > > > > point, I think this would become a generic "trust checks" api for
> > > > > everything related to "generating executable code", e.g. javascript,
> > > > > java code, and more.
> > > > > We will want to clearly define the scope of execveat(AT_CHECK)
> > > >
> > > > The line between data and code is blurry.  For instance, a configuration
> > > > file can impact the execution flow of a program.  So, where to draw the
> > > > line?
> > > >
> > > > It might makes sense to follow the kernel and interpreter semantic: if a
> > > > file can be executed by the kernel (e.g. ELF binary, file containing a
> > > > shebang, or just configured with binfmt_misc), then this should be
> > > > considered as executable code.  This applies to Bash, Python,
> > > > Javascript, NodeJS, PE, PHP...  However, we can also make a picture
> > > > executable with binfmt_misc.  So, again, where to draw the line?
> > > >
> > > > I'd recommend to think about interaction with the outside, through
> > > > function calls, IPCs, syscalls...  For instance, "running" an image
> > > > should not lead to reading or writing to arbitrary files, or accessing
> > > > the network, but in practice it is legitimate for some file formats...
> > > > PostScript is a programming language, but mostly used to draw pictures.
> > > > So, again, where to draw the line?
> > > >
> > > The javascript is run by browser and java code by java runtime, do
> > > they meet the criteria? they do not interact with the kernel directly,
> > > however they might have the same "executable" characteristics and the
> > > app might not want them to be put into non-exec mount.
> > >
> > > If the answer is yes, they can also use execveat(AT_CHECK),  the next
> > > question is: does it make sense for javacript/java code to go through
> > > execveat() code path, allocate bprm, etc ? (I don't have answer, maybe
> > > it is)
> >
> > Java and NodeJS can do arbitrary syscalls (through their runtime) and
> > they can access arbitrary files, so according to my below comment, yes
> > they should be managed as potentially dangerous executable code.
> >
> > The question should be: is this code trusted? Most of the time it is
> > not, hence the security model of web browser and their heavy use of
> > sandboxing.  So no, I don't think it would make sense to check this kind
> > of code more than what the browser already do.
> >
> 
> If I understand you correctly, Java/NodeJS won't use
> execveat(AT_CHECK), we will leave that work to the web browser/java
> runtime's sandboxer.
> This is good because the scope is more narrow/clear.

Yes for browser's sandboxes because the code comes from the network
(i.e. not authenticated at the kernel level, and mostly untrusted).

For standalone Java applications (stored in the filesystem), the Java
runtime(s) should be patched as other script interpreters.

> 
> Thanks
> -Jeff
> 
> > I'll talk about this use case in the next patch series.
> >
> > >
> > > > We should follow the principle of least astonishment.  What most users
> > > > would expect?  This should follow the *common usage* of executable
> > > > files.  At the end, the script interpreters will be patched by security
> > > > folks for security reasons.  I think the right question to ask should
> > > > be: could this file format be (ab)used to leak or modify arbitrary
> > > > files, or to perform arbitrary syscalls?  If the answer is yes, then it
> > > > should be checked for executability.  Of course, this excludes bugs
> > > > exploited in the file format parser.
> > > >
> > > > I'll extend the next patch series with this rationale.
> > > >
> > >

Mickaël Salaün July 23, 2024, 1:16 p.m. UTC | #48

On Sat, Jul 20, 2024 at 09:59:33AM +0800, Andy Lutomirski wrote:
> > On Jul 18, 2024, at 8:22 PM, Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Thu, Jul 18, 2024 at 09:02:56AM +0800, Andy Lutomirski wrote:
> >>>> On Jul 17, 2024, at 6:01 PM, Mickaël Salaün <mic@digikod.net> wrote:
> >>>
> >>> On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote:
> >>>>> On 17/07/2024 07:33, Jeff Xu wrote:
> >>>>> Consider those cases: I think:
> >>>>> a> relying purely on userspace for enforcement does't seem to be
> >>>>> effective,  e.g. it is trivial  to call open(), then mmap() it into
> >>>>> executable memory.
> >>>>
> >>>> If there's a way to do this without running executable code that had to pass
> >>>> a previous execveat() check, then yeah, it's not effective (e.g. a Python
> >>>> interpreter that *doesn't* enforce execveat() is a trivial way to do it).
> >>>>
> >>>> Once arbitrary code is running, all bets are off. So long as all arbitrary
> >>>> code is being checked itself, it's allowed to do things that would bypass
> >>>> later checks (and it's up to whoever audited it in the first place to
> >>>> prevent this by not giving it the special mark that allows it to pass the
> >>>> check).
> >>>
> >>> Exactly.  As explained in the patches, one crucial prerequisite is that
> >>> the executable code is trusted, and the system must provide integrity
> >>> guarantees.  We cannot do anything without that.  This patches series is
> >>> a building block to fix a blind spot on Linux systems to be able to
> >>> fully control executability.
> >>
> >> Circling back to my previous comment (did that ever get noticed?), I
> >
> > Yes, I replied to your comments.  Did I miss something?
> 
> I missed that email in the pile, sorry. I’ll reply separately.
> 
> >
> >> don’t think this is quite right:
> >>
> >> https://lore.kernel.org/all/CALCETrWYu=PYJSgyJ-vaa+3BGAry8Jo8xErZLiGR3U5h6+U0tA@mail.gmail.com/
> >>
> >> On a basic system configuration, a given path either may or may not be
> >> executed. And maybe that path has some integrity check (dm-verity,
> >> etc).  So the kernel should tell the interpreter/loader whether the
> >> target may be executed. All fine.
> >>
> >> But I think the more complex cases are more interesting, and the
> >> “execute a program” process IS NOT BINARY.  An attempt to execute can
> >> be rejected outright, or it can be allowed *with a change to creds or
> >> security context*.  It would be entirely reasonable to have a policy
> >> that allows execution of non-integrity-checked files but in a very
> >> locked down context only.
> >
> > I guess you mean to transition to a sandbox when executing an untrusted
> > file.  This is a good idea.  I talked about role transition in the
> > patch's description:
> >
> > With the information that a script interpreter is about to interpret a
> > script, an LSM security policy can adjust caller's access rights or log
> > execution request as for native script execution (e.g. role transition).
> > This is possible thanks to the call to security_bprm_creds_for_exec().
> 
> …
> 
> > This patch series brings the minimal building blocks to have a
> > consistent execution environment.  Role transitions for script execution
> > are left to LSMs.  For instance, we could extend Landlock to
> > automatically sandbox untrusted scripts.
> 
> I’m not really convinced.  There’s more to building an API that
> enables LSM hooks than merely sticking the hook somewhere in kernel
> code. It needs to be a defined API. If you call an operation “check”,
> then people will expect it to check, not to change the caller’s
> credentials.  And people will mess it up in both directions (e.g.
> callers will call it and then open try to load some library that they
> should have loaded first, or callers will call it and forget to close
> fds first.
> 
> And there should probably be some interaction with dumpable as well.
> If I “check” a file for executability, that should not suddenly allow
> someone to ptrace me?
> 
> And callers need to know to exit on failure, not carry on.
> 
> 
> More concretely, a runtime that fully opts in to this may well "check"
> multiple things.  For example, if I do:
> 
> $ ld.so ~/.local/bin/some_program   (i.e. I literally execve ld.so)
> 
> then ld.so will load several things:
> 
> ~/.local/bin/some_program
> libc.so
> other random DSOs, some of which may well be in my home directory
> 
> And for all ld.so knows, some_program is actually an interpreter and
> will "check" something else.  And the LSMs have absolutely no clue
> what's what.  So I think for this to work right, the APIs need to be a
> lot more expressive and explicit:
> 
> check_library(fd to libc.so);  <-- does not transition or otherwise drop privs
> check_transition_main_program(fd to ~/.local/bin/some_program);  <--
> may drop privs
> 
> and if some_program is really an interpreter, then it will do:
> 
> check_library(fd to some thing imported by the script);
> check_transition_main_program(fd to the actual script);
> 
> And maybe that takes a parameter that gets run eval-style:
> 
> check_unsafe_user_script("actual contents of snippet");
> 
> The actual spelling of all this doesn't matter so much.  But the user
> code and the kernel code need to be on the same page as to what the
> user program is doing and what it's asking the kernel program to do.

I agree.  I'll remove any references to "role transition".  This kind of
feature should come with something like getpeercon/setexeccon(3).

> 
> --Andy
>

Mickaël Salaün July 23, 2024, 1:16 p.m. UTC | #49

On Sat, Jul 20, 2024 at 02:43:41PM +0300, Jarkko Sakkinen wrote:
> On Sat Jul 20, 2024 at 4:59 AM EEST, Andy Lutomirski wrote:
> > > On Jul 18, 2024, at 8:22 PM, Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Thu, Jul 18, 2024 at 09:02:56AM +0800, Andy Lutomirski wrote:
> > >>>> On Jul 17, 2024, at 6:01 PM, Mickaël Salaün <mic@digikod.net> wrote:
> > >>>
> > >>> On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote:
> > >>>>> On 17/07/2024 07:33, Jeff Xu wrote:
> > >>>>> Consider those cases: I think:
> > >>>>> a> relying purely on userspace for enforcement does't seem to be
> > >>>>> effective,  e.g. it is trivial  to call open(), then mmap() it into
> > >>>>> executable memory.
> > >>>>
> > >>>> If there's a way to do this without running executable code that had to pass
> > >>>> a previous execveat() check, then yeah, it's not effective (e.g. a Python
> > >>>> interpreter that *doesn't* enforce execveat() is a trivial way to do it).
> > >>>>
> > >>>> Once arbitrary code is running, all bets are off. So long as all arbitrary
> > >>>> code is being checked itself, it's allowed to do things that would bypass
> > >>>> later checks (and it's up to whoever audited it in the first place to
> > >>>> prevent this by not giving it the special mark that allows it to pass the
> > >>>> check).
> > >>>
> > >>> Exactly.  As explained in the patches, one crucial prerequisite is that
> > >>> the executable code is trusted, and the system must provide integrity
> > >>> guarantees.  We cannot do anything without that.  This patches series is
> > >>> a building block to fix a blind spot on Linux systems to be able to
> > >>> fully control executability.
> > >>
> > >> Circling back to my previous comment (did that ever get noticed?), I
> > >
> > > Yes, I replied to your comments.  Did I miss something?
> >
> > I missed that email in the pile, sorry. I’ll reply separately.
> >
> > >
> > >> don’t think this is quite right:
> > >>
> > >> https://lore.kernel.org/all/CALCETrWYu=PYJSgyJ-vaa+3BGAry8Jo8xErZLiGR3U5h6+U0tA@mail.gmail.com/
> > >>
> > >> On a basic system configuration, a given path either may or may not be
> > >> executed. And maybe that path has some integrity check (dm-verity,
> > >> etc).  So the kernel should tell the interpreter/loader whether the
> > >> target may be executed. All fine.
> > >>
> > >> But I think the more complex cases are more interesting, and the
> > >> “execute a program” process IS NOT BINARY.  An attempt to execute can
> > >> be rejected outright, or it can be allowed *with a change to creds or
> > >> security context*.  It would be entirely reasonable to have a policy
> > >> that allows execution of non-integrity-checked files but in a very
> > >> locked down context only.
> > >
> > > I guess you mean to transition to a sandbox when executing an untrusted
> > > file.  This is a good idea.  I talked about role transition in the
> > > patch's description:
> > >
> > > With the information that a script interpreter is about to interpret a
> > > script, an LSM security policy can adjust caller's access rights or log
> > > execution request as for native script execution (e.g. role transition).
> > > This is possible thanks to the call to security_bprm_creds_for_exec().
> >
> > …
> >
> > > This patch series brings the minimal building blocks to have a
> > > consistent execution environment.  Role transitions for script execution
> > > are left to LSMs.  For instance, we could extend Landlock to
> > > automatically sandbox untrusted scripts.
> >
> > I’m not really convinced.  There’s more to building an API that
> > enables LSM hooks than merely sticking the hook somewhere in kernel
> > code. It needs to be a defined API. If you call an operation “check”,
> > then people will expect it to check, not to change the caller’s
> > credentials.  And people will mess it up in both directions (e.g.
> > callers will call it and then open try to load some library that they
> > should have loaded first, or callers will call it and forget to close
> > fds first.
> >
> > And there should probably be some interaction with dumpable as well.
> > If I “check” a file for executability, that should not suddenly allow
> > someone to ptrace me?
> >
> > And callers need to know to exit on failure, not carry on.
> >
> >
> > More concretely, a runtime that fully opts in to this may well "check"
> > multiple things.  For example, if I do:
> >
> > $ ld.so ~/.local/bin/some_program   (i.e. I literally execve ld.so)
> >
> > then ld.so will load several things:
> >
> > ~/.local/bin/some_program
> > libc.so
> > other random DSOs, some of which may well be in my home directory
> 
> What would really help to comprehend this patch set would be a set of
> test scripts, preferably something that you can run easily with
> BuildRoot or similar.
> 
> Scripts would demonstrate the use cases for the patch set. Then it
> would be easier to develop scripts that would underline the corner
> cases. I would keep all this out of kselftest shenanigans for now.

I'll include a toy script interpreter with the next patch series.  This
one was an RFC.

> 
> I feel that the patch set is hovering in abstractions with examples
> that you cannot execute.
> 
> I added the patches to standard test CI hack:
> 
> https://codeberg.org/jarkko/linux-tpmdd-test
> 
> But after I booted up a kernel I had no idea what to do with it. And
> all this lenghty discussion makes it even more confusing.

You can run the tests in the CI.

> 
> Please find some connection to the real world before sending any new
> version of this (e.g. via test scripts). I think this should not be
> pulled before almost anyone doing kernel dev can comprehend the "gist"
> at least in some reasonable level.

You'll find in this patch series (cover letter, patch description, and
comments) connection to the real world. :)
The next patch series should take into account the current discussions.

> 
> BR, Jarkko

Jeff Xu Aug. 5, 2024, 6:35 p.m. UTC | #50

On Tue, Jul 23, 2024 at 6:15 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Fri, Jul 19, 2024 at 08:27:18AM -0700, Jeff Xu wrote:
> > On Fri, Jul 19, 2024 at 8:04 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Fri, Jul 19, 2024 at 07:16:55AM -0700, Jeff Xu wrote:
> > > > On Fri, Jul 19, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > >
> > > > > On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote:
> > > > > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > >
> > > > > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > > > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > > > > > > executable code, scripts could also use this check [1].
> > > > > > > > > > >
> > > > > > > > > > > This is different than faccessat(2) which only checks file access
> > > > > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > > > > > > real execution, user space gets the same error codes.
> > > > > > > > > > >
> > > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > > > > > > exec, shared object, script and config file (such as seccomp config),
> > > > >
> > > > > > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > > > > > > > > sure it passes AT_CHECK, before loading it into memory.
> > > > > > > > >
> > > > > > > > > All ELF dependencies are opened and checked with open_exec(), which
> > > > > > > > > perform the main executability checks (with the __FMODE_EXEC flag).
> > > > > > > > > Did I miss something?
> > > > > > > > >
> > > > > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> > > > > > > > The app can choose its own dynamic linker path during build, (maybe
> > > > > > > > even statically link one ?)  This is another reason that relying on a
> > > > > > > > userspace only is not enough.
> > > > > > >
> > > > > > > The kernel calls open_exec() on all dependencies, including
> > > > > > > ld-linux-x86-64.so.2, so these files are checked for executability too.
> > > > > > >
> > > > > > This might not be entirely true. iiuc, kernel  calls open_exec for
> > > > > > open_exec for interpreter, but not all its dependency (e.g. libc.so.6)
> > > > >
> > > > > Correct, the dynamic linker is in charge of that, which is why it must
> > > > > be enlighten with execveat+AT_CHECK and securebits checks.
> > > > >
> > > > > > load_elf_binary() {
> > > > > >    interpreter = open_exec(elf_interpreter);
> > > > > > }
> > > > > >
> > > > > > libc.so.6 is opened and mapped by dynamic linker.
> > > > > > so the call sequence is:
> > > > > >  execve(a.out)
> > > > > >   - open exec(a.out)
> > > > > >   - security_bprm_creds(a.out)
> > > > > >   - open the exec(ld.so)
> > > > > >   - call open_exec() for interruptor (ld.so)
> > > > > >   - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through
> > > > > > the same check and code path as libc.so below ?
> > > > >
> > > > > open_exec() checks are enough.  LSMs can use this information (open +
> > > > > __FMODE_EXEC) if needed.  execveat+AT_CHECK is only a user space
> > > > > request.
> > > > >
> > > > Then the ld.so doesn't go through the same security_bprm_creds() check
> > > > as other .so.
> > >
> > > Indeed, but...
> > >
> > My point is: we will want all the .so going through the same code
> > path, so  security_ functions are called consistently across all the
> > objects, And in the future, if we want to develop additional LSM
> > functionality based on AT_CHECK, it will be applied to all objects.
>
> I'll extend the doc to encourage LSMs to check for __FMODE_EXEC, which
> already is the common security check for all executable dependencies.
> As extra information, they can get explicit requests by looking at
> execveat+AT_CHECK call.
>
I agree that security_file_open + __FMODE_EXEC for checking all
the .so (e.g for executable memfd) is a better option  than checking at
security_bprm_creds_for_exec.

But then maybe execveat( AT_CHECK) can return after  calling alloc_bprm ?
See below call graph:

do_execveat_common (AT_CHECK)
-> alloc_bprm
->->do_open_execat
->->-> do_filp_open (__FMODE_EXEC)
->->->->->->> security_file_open
-> bprm_execve
->-> prepare_exec_creds
->->-> prepare_creds
->->->-> security_prepare_creds
->-> security_bprm_creds_for_exec

What is the consideration to mark the end at
security_bprm_creds_for_exec ? i.e. including brpm_execve,
prepare_creds, security_prepare_creds, security_bprm_creds_for_exec.

Since dynamic linker doesn't load ld.so (it is by kernel),  ld.so
won't go through those  security_prepare_creds and
security_bprm_creds_for_exec checks like other .so do.

> >
> > Another thing to consider is:  we are asking userspace to make
> > additional syscall before  loading the file into memory/get executed,
> > there is a possibility for future expansion of the mechanism, without
> > asking user space to add another syscall again.
>
> AT_CHECK is defined with a specific semantic.  Other mechanisms (e.g.
> LSM policies) could enforce other restrictions following the same
> semantic.  We need to keep in mind backward compatibility.
>
> >
> > I m still not convinced yet that execveat(AT_CHECK) fits more than
> > faccessat(AT_CHECK)
>
> faccessat2(2) is dedicated to file permission/attribute check.
> execveat(2) is dedicated to execution, which is a superset of file
> permission for executability, plus other checks (e.g. noexec).
>
That sounds reasonable, but if execveat(AT_CHECK) changes behavior of
execveat(),  someone might argue that faccessat2(EXEC_CHECK) can be
made for the executability.

I think the decision might depend on what this PATCH intended to
check, i.e. where we draw the line.

do_open_execat() seems to cover lots of checks for executability, if
we are ok with the thing that do_open_execat() checks, then
faccessat(AT_CHECK) calling do_open_execat() is an option, it  won't
have those "unrelated" calls  in execve path, e.g.  bprm_stack_limits,
copy argc/env .

However, you mentioned superset of file permission for executability,
can you elaborate on that ? Is there something not included in
do_open_execat() but still necessary for execveat(AT_CHECK)? maybe
security_bprm_creds_for_exec? (this goes back to my  question above)

Thanks
Best regards,
-Jeff











> >
> >
> > > >
> > > > As my previous email, the ChromeOS LSM restricts executable mfd
> > > > through security_bprm_creds(), the end result is that ld.so can still
> > > > be executable memfd, but not other .so.
> > >
> > > The chromeOS LSM can check that with the security_file_open() hook and
> > > the __FMODE_EXEC flag, see Landlock's implementation.  I think this
> > > should be the only hook implementation that chromeOS LSM needs to add.
> > >
> > > >
> > > > One way to address this is to refactor the necessary code from
> > > > execveat() code patch, and make it available to call from both kernel
> > > > and execveat() code paths., but if we do that, we might as well use
> > > > faccessat2(AT_CHECK)
> > >
> > > That's why I think it makes sense to rely on the existing __FMODE_EXEC
> > > information.
> > >
> > > >
> > > >
> > > > > >   - transfer the control to ld.so)
> > > > > >   - ld.so open (libc.so)
> > > > > >   - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch,
> > > > > > require dynamic linker change.
> > > > > >   - ld.so mmap(libc.so,rx)
> > > > >
> > > > > Explaining these steps is useful. I'll include that in the next patch
> > > > > series.
> >

Mickaël Salaün Aug. 9, 2024, 8:45 a.m. UTC | #51

On Mon, Aug 05, 2024 at 11:35:09AM -0700, Jeff Xu wrote:
> On Tue, Jul 23, 2024 at 6:15 AM Mickaël Salaün <mic@digikod.net> wrote:
> >
> > On Fri, Jul 19, 2024 at 08:27:18AM -0700, Jeff Xu wrote:
> > > On Fri, Jul 19, 2024 at 8:04 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > >
> > > > On Fri, Jul 19, 2024 at 07:16:55AM -0700, Jeff Xu wrote:
> > > > > On Fri, Jul 19, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > >
> > > > > > On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote:
> > > > > > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > >
> > > > > > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > > > > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > > > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > > > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > > > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > > > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > > > > > > > executable code, scripts could also use this check [1].
> > > > > > > > > > > >
> > > > > > > > > > > > This is different than faccessat(2) which only checks file access
> > > > > > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > > > > > > > real execution, user space gets the same error codes.
> > > > > > > > > > > >
> > > > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > > > > > > > exec, shared object, script and config file (such as seccomp config),
> > > > > >
> > > > > > > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > > > > > > > > > sure it passes AT_CHECK, before loading it into memory.
> > > > > > > > > >
> > > > > > > > > > All ELF dependencies are opened and checked with open_exec(), which
> > > > > > > > > > perform the main executability checks (with the __FMODE_EXEC flag).
> > > > > > > > > > Did I miss something?
> > > > > > > > > >
> > > > > > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> > > > > > > > > The app can choose its own dynamic linker path during build, (maybe
> > > > > > > > > even statically link one ?)  This is another reason that relying on a
> > > > > > > > > userspace only is not enough.
> > > > > > > >
> > > > > > > > The kernel calls open_exec() on all dependencies, including
> > > > > > > > ld-linux-x86-64.so.2, so these files are checked for executability too.
> > > > > > > >
> > > > > > > This might not be entirely true. iiuc, kernel  calls open_exec for
> > > > > > > open_exec for interpreter, but not all its dependency (e.g. libc.so.6)
> > > > > >
> > > > > > Correct, the dynamic linker is in charge of that, which is why it must
> > > > > > be enlighten with execveat+AT_CHECK and securebits checks.
> > > > > >
> > > > > > > load_elf_binary() {
> > > > > > >    interpreter = open_exec(elf_interpreter);
> > > > > > > }
> > > > > > >
> > > > > > > libc.so.6 is opened and mapped by dynamic linker.
> > > > > > > so the call sequence is:
> > > > > > >  execve(a.out)
> > > > > > >   - open exec(a.out)
> > > > > > >   - security_bprm_creds(a.out)
> > > > > > >   - open the exec(ld.so)
> > > > > > >   - call open_exec() for interruptor (ld.so)
> > > > > > >   - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through
> > > > > > > the same check and code path as libc.so below ?
> > > > > >
> > > > > > open_exec() checks are enough.  LSMs can use this information (open +
> > > > > > __FMODE_EXEC) if needed.  execveat+AT_CHECK is only a user space
> > > > > > request.
> > > > > >
> > > > > Then the ld.so doesn't go through the same security_bprm_creds() check
> > > > > as other .so.
> > > >
> > > > Indeed, but...
> > > >
> > > My point is: we will want all the .so going through the same code
> > > path, so  security_ functions are called consistently across all the
> > > objects, And in the future, if we want to develop additional LSM
> > > functionality based on AT_CHECK, it will be applied to all objects.
> >
> > I'll extend the doc to encourage LSMs to check for __FMODE_EXEC, which
> > already is the common security check for all executable dependencies.
> > As extra information, they can get explicit requests by looking at
> > execveat+AT_CHECK call.
> >
> I agree that security_file_open + __FMODE_EXEC for checking all
> the .so (e.g for executable memfd) is a better option  than checking at
> security_bprm_creds_for_exec.
> 
> But then maybe execveat( AT_CHECK) can return after  calling alloc_bprm ?
> See below call graph:
> 
> do_execveat_common (AT_CHECK)
> -> alloc_bprm
> ->->do_open_execat
> ->->-> do_filp_open (__FMODE_EXEC)
> ->->->->->->> security_file_open
> -> bprm_execve
> ->-> prepare_exec_creds
> ->->-> prepare_creds
> ->->->-> security_prepare_creds
> ->-> security_bprm_creds_for_exec
> 
> What is the consideration to mark the end at
> security_bprm_creds_for_exec ? i.e. including brpm_execve,
> prepare_creds, security_prepare_creds, security_bprm_creds_for_exec.

This enables LSMs to know/log an explicit execution request, including
context with argv and envp.

> 
> Since dynamic linker doesn't load ld.so (it is by kernel),  ld.so
> won't go through those  security_prepare_creds and
> security_bprm_creds_for_exec checks like other .so do.

Yes, but this is not an issue nor an explicit request. ld.so is only one
case of this patch series.

> 
> > >
> > > Another thing to consider is:  we are asking userspace to make
> > > additional syscall before  loading the file into memory/get executed,
> > > there is a possibility for future expansion of the mechanism, without
> > > asking user space to add another syscall again.
> >
> > AT_CHECK is defined with a specific semantic.  Other mechanisms (e.g.
> > LSM policies) could enforce other restrictions following the same
> > semantic.  We need to keep in mind backward compatibility.
> >
> > >
> > > I m still not convinced yet that execveat(AT_CHECK) fits more than
> > > faccessat(AT_CHECK)
> >
> > faccessat2(2) is dedicated to file permission/attribute check.
> > execveat(2) is dedicated to execution, which is a superset of file
> > permission for executability, plus other checks (e.g. noexec).
> >
> That sounds reasonable, but if execveat(AT_CHECK) changes behavior of
> execveat(),  someone might argue that faccessat2(EXEC_CHECK) can be
> made for the executability.

AT_CHECK, as any other syscall flags, changes the behavior of execveat,
but the overall semantic is clearly defined.

Again, faccessat2 is only dedicated to file attributes/permissions, not
file executability.

> 
> I think the decision might depend on what this PATCH intended to
> check, i.e. where we draw the line.

The goal is clearly defined in the cover letter and patches: makes it
possible to control (or log) script execution.

> 
> do_open_execat() seems to cover lots of checks for executability, if
> we are ok with the thing that do_open_execat() checks, then
> faccessat(AT_CHECK) calling do_open_execat() is an option, it  won't
> have those "unrelated" calls  in execve path, e.g.  bprm_stack_limits,
> copy argc/env .

I don't thing there is any unrelated calls in execve path, quite the
contrary, it follows the same semantic as for a full execution, and
that's another argument to use the execveat interface.  Otherwise, we
couldn't argue that `./script.sh` can be the same as `sh script.sh`

The only difference is that user space is in charge of parsing and
interpreting the file's content.

> 
> However, you mentioned superset of file permission for executability,
> can you elaborate on that ? Is there something not included in
> do_open_execat() but still necessary for execveat(AT_CHECK)? maybe
> security_bprm_creds_for_exec? (this goes back to my  question above)

As explained above, the goal is to have the same semantic as a full
execveat call, taking into account all the checks (e.g. stack limit,
argv/envp...).

> 
> Thanks
> Best regards,
> -Jeff
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> > >
> > >
> > > > >
> > > > > As my previous email, the ChromeOS LSM restricts executable mfd
> > > > > through security_bprm_creds(), the end result is that ld.so can still
> > > > > be executable memfd, but not other .so.
> > > >
> > > > The chromeOS LSM can check that with the security_file_open() hook and
> > > > the __FMODE_EXEC flag, see Landlock's implementation.  I think this
> > > > should be the only hook implementation that chromeOS LSM needs to add.
> > > >
> > > > >
> > > > > One way to address this is to refactor the necessary code from
> > > > > execveat() code patch, and make it available to call from both kernel
> > > > > and execveat() code paths., but if we do that, we might as well use
> > > > > faccessat2(AT_CHECK)
> > > >
> > > > That's why I think it makes sense to rely on the existing __FMODE_EXEC
> > > > information.
> > > >
> > > > >
> > > > >
> > > > > > >   - transfer the control to ld.so)
> > > > > > >   - ld.so open (libc.so)
> > > > > > >   - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch,
> > > > > > > require dynamic linker change.
> > > > > > >   - ld.so mmap(libc.so,rx)
> > > > > >
> > > > > > Explaining these steps is useful. I'll include that in the next patch
> > > > > > series.
> > >
>

Jeff Xu Aug. 9, 2024, 4:15 p.m. UTC | #52

On Fri, Aug 9, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote:
>
> On Mon, Aug 05, 2024 at 11:35:09AM -0700, Jeff Xu wrote:
> > On Tue, Jul 23, 2024 at 6:15 AM Mickaël Salaün <mic@digikod.net> wrote:
> > >
> > > On Fri, Jul 19, 2024 at 08:27:18AM -0700, Jeff Xu wrote:
> > > > On Fri, Jul 19, 2024 at 8:04 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > >
> > > > > On Fri, Jul 19, 2024 at 07:16:55AM -0700, Jeff Xu wrote:
> > > > > > On Fri, Jul 19, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > >
> > > > > > > On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote:
> > > > > > > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote:
> > > > > > > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote:
> > > > > > > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be
> > > > > > > > > > > > > allowed for execution.  The main use case is for script interpreters and
> > > > > > > > > > > > > dynamic linkers to check execution permission according to the kernel's
> > > > > > > > > > > > > security policy. Another use case is to add context to access logs e.g.,
> > > > > > > > > > > > > which script (instead of interpreter) accessed a file.  As any
> > > > > > > > > > > > > executable code, scripts could also use this check [1].
> > > > > > > > > > > > >
> > > > > > > > > > > > > This is different than faccessat(2) which only checks file access
> > > > > > > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit,
> > > > > > > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials).
> > > > > > > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a
> > > > > > > > > > > > > real execution, user space gets the same error codes.
> > > > > > > > > > > > >
> > > > > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the
> > > > > > > > > > > > exec, shared object, script and config file (such as seccomp config),
> > > > > > >
> > > > > > > > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make
> > > > > > > > > > > > sure it passes AT_CHECK, before loading it into memory.
> > > > > > > > > > >
> > > > > > > > > > > All ELF dependencies are opened and checked with open_exec(), which
> > > > > > > > > > > perform the main executability checks (with the __FMODE_EXEC flag).
> > > > > > > > > > > Did I miss something?
> > > > > > > > > > >
> > > > > > > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel.
> > > > > > > > > > The app can choose its own dynamic linker path during build, (maybe
> > > > > > > > > > even statically link one ?)  This is another reason that relying on a
> > > > > > > > > > userspace only is not enough.
> > > > > > > > >
> > > > > > > > > The kernel calls open_exec() on all dependencies, including
> > > > > > > > > ld-linux-x86-64.so.2, so these files are checked for executability too.
> > > > > > > > >
> > > > > > > > This might not be entirely true. iiuc, kernel  calls open_exec for
> > > > > > > > open_exec for interpreter, but not all its dependency (e.g. libc.so.6)
> > > > > > >
> > > > > > > Correct, the dynamic linker is in charge of that, which is why it must
> > > > > > > be enlighten with execveat+AT_CHECK and securebits checks.
> > > > > > >
> > > > > > > > load_elf_binary() {
> > > > > > > >    interpreter = open_exec(elf_interpreter);
> > > > > > > > }
> > > > > > > >
> > > > > > > > libc.so.6 is opened and mapped by dynamic linker.
> > > > > > > > so the call sequence is:
> > > > > > > >  execve(a.out)
> > > > > > > >   - open exec(a.out)
> > > > > > > >   - security_bprm_creds(a.out)
> > > > > > > >   - open the exec(ld.so)
> > > > > > > >   - call open_exec() for interruptor (ld.so)
> > > > > > > >   - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through
> > > > > > > > the same check and code path as libc.so below ?
> > > > > > >
> > > > > > > open_exec() checks are enough.  LSMs can use this information (open +
> > > > > > > __FMODE_EXEC) if needed.  execveat+AT_CHECK is only a user space
> > > > > > > request.
> > > > > > >
> > > > > > Then the ld.so doesn't go through the same security_bprm_creds() check
> > > > > > as other .so.
> > > > >
> > > > > Indeed, but...
> > > > >
> > > > My point is: we will want all the .so going through the same code
> > > > path, so  security_ functions are called consistently across all the
> > > > objects, And in the future, if we want to develop additional LSM
> > > > functionality based on AT_CHECK, it will be applied to all objects.
> > >
> > > I'll extend the doc to encourage LSMs to check for __FMODE_EXEC, which
> > > already is the common security check for all executable dependencies.
> > > As extra information, they can get explicit requests by looking at
> > > execveat+AT_CHECK call.
> > >
> > I agree that security_file_open + __FMODE_EXEC for checking all
> > the .so (e.g for executable memfd) is a better option  than checking at
> > security_bprm_creds_for_exec.
> >
> > But then maybe execveat( AT_CHECK) can return after  calling alloc_bprm ?
> > See below call graph:
> >
> > do_execveat_common (AT_CHECK)
> > -> alloc_bprm
> > ->->do_open_execat
> > ->->-> do_filp_open (__FMODE_EXEC)
> > ->->->->->->> security_file_open
> > -> bprm_execve
> > ->-> prepare_exec_creds
> > ->->-> prepare_creds
> > ->->->-> security_prepare_creds
> > ->-> security_bprm_creds_for_exec
> >
> > What is the consideration to mark the end at
> > security_bprm_creds_for_exec ? i.e. including brpm_execve,
> > prepare_creds, security_prepare_creds, security_bprm_creds_for_exec.
>
> This enables LSMs to know/log an explicit execution request, including
> context with argv and envp.
>
> >
> > Since dynamic linker doesn't load ld.so (it is by kernel),  ld.so
> > won't go through those  security_prepare_creds and
> > security_bprm_creds_for_exec checks like other .so do.
>
> Yes, but this is not an issue nor an explicit request. ld.so is only one
> case of this patch series.
>
> >
> > > >
> > > > Another thing to consider is:  we are asking userspace to make
> > > > additional syscall before  loading the file into memory/get executed,
> > > > there is a possibility for future expansion of the mechanism, without
> > > > asking user space to add another syscall again.
> > >
> > > AT_CHECK is defined with a specific semantic.  Other mechanisms (e.g.
> > > LSM policies) could enforce other restrictions following the same
> > > semantic.  We need to keep in mind backward compatibility.
> > >
> > > >
> > > > I m still not convinced yet that execveat(AT_CHECK) fits more than
> > > > faccessat(AT_CHECK)
> > >
> > > faccessat2(2) is dedicated to file permission/attribute check.
> > > execveat(2) is dedicated to execution, which is a superset of file
> > > permission for executability, plus other checks (e.g. noexec).
> > >
> > That sounds reasonable, but if execveat(AT_CHECK) changes behavior of
> > execveat(),  someone might argue that faccessat2(EXEC_CHECK) can be
> > made for the executability.
>
> AT_CHECK, as any other syscall flags, changes the behavior of execveat,
> but the overall semantic is clearly defined.
>
> Again, faccessat2 is only dedicated to file attributes/permissions, not
> file executability.
>
> >
> > I think the decision might depend on what this PATCH intended to
> > check, i.e. where we draw the line.
>
> The goal is clearly defined in the cover letter and patches: makes it
> possible to control (or log) script execution.
>
> >
> > do_open_execat() seems to cover lots of checks for executability, if
> > we are ok with the thing that do_open_execat() checks, then
> > faccessat(AT_CHECK) calling do_open_execat() is an option, it  won't
> > have those "unrelated" calls  in execve path, e.g.  bprm_stack_limits,
> > copy argc/env .
>
> I don't thing there is any unrelated calls in execve path, quite the
> contrary, it follows the same semantic as for a full execution, and
> that's another argument to use the execveat interface.  Otherwise, we
> couldn't argue that `./script.sh` can be the same as `sh script.sh`
>
It is a good point from the  "scrip.sh/exec" perspective that we want
it to go through the same execve path.
The reasoning is not obvious from the ".so" which doesn't go through
stack/env check.
Since execveat(AT_CHECK) wants to cover both cases, it is fine.

> The only difference is that user space is in charge of parsing and
> interpreting the file's content.
>
> >
> > However, you mentioned superset of file permission for executability,
> > can you elaborate on that ? Is there something not included in
> > do_open_execat() but still necessary for execveat(AT_CHECK)? maybe
> > security_bprm_creds_for_exec? (this goes back to my  question above)
>
> As explained above, the goal is to have the same semantic as a full
> execveat call, taking into account all the checks (e.g. stack limit,
> argv/envp...).
>
I'm fine with this, thanks for taking time to explain the design.

Regarding the future LSM based on this patch series:
For .so,  security_file_open is recommended for LSM.
For scripts/exec (that needs a full exec code path),
security_file_open and security_bprm_creds_for_exec can both be used.

Thanks
Best regards,
-Jeff

[RFC,v19,1/5] exec: Add a new AT_CHECK flag to execveat(2)

Commit Message

Comments

Patch