Message ID | 20241112191858.162021-2-mic@digikod.net (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Script execution control (was O_MAYEXEC) | expand |
On Tue, Nov 12, 2024 at 11:22 AM Mickaël Salaün <mic@digikod.net> wrote: > > Add a new AT_EXECVE_CHECK flag to execveat(2) to check if a file would > be allowed for execution. The main use case is for script interpreters > and dynamic linkers to check execution permission according to the > kernel's security policy. Another use case is to add context to access > logs e.g., which script (instead of interpreter) accessed a file. As > any executable code, scripts could also use this check [1]. > > This is different from faccessat(2) + X_OK which only checks a subset of > access rights (i.e. inode permission and mount options for regular > files), but not the full context (e.g. all LSM access checks). The main > use case for access(2) is for SUID processes to (partially) check access > on behalf of their caller. The main use case for execveat(2) + > AT_EXECVE_CHECK is to check if a script execution would be allowed, > according to all the different restrictions in place. Because the use > of AT_EXECVE_CHECK follows the exact kernel semantic as for a real > execution, user space gets the same error codes. > > An interesting point of using execveat(2) instead of openat2(2) is that > it decouples the check from the enforcement. Indeed, the security check > can be logged (e.g. with audit) without blocking an execution > environment not yet ready to enforce a strict security policy. > > LSMs can control or log execution requests with > security_bprm_creds_for_exec(). However, to enforce a consistent and > complete access control (e.g. on binary's dependencies) LSMs should > restrict file executability, or mesure executed files, with > security_file_open() by checking file->f_flags & __FMODE_EXEC. > > Because AT_EXECVE_CHECK is dedicated to user space interpreters, it > doesn't make sense for the kernel to parse the checked files, look for > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC > if the format is unknown. Because of that, security_bprm_check() is > never called when AT_EXECVE_CHECK is used. > > It should be noted that script interpreters cannot directly use > execveat(2) (without this new AT_EXECVE_CHECK flag) because this could > lead to unexpected behaviors e.g., `python script.sh` could lead to Bash > being executed to interpret the script. Unlike the kernel, script > interpreters may just interpret the shebang as a simple comment, which > should not change for backward compatibility reasons. > > Because scripts or libraries files might not currently have the > executable permission set, or because we might want specific users to be > allowed to run arbitrary scripts, the following patch provides a dynamic > configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and > SECBIT_EXEC_DENY_INTERACTIVE securebits. > > This is a redesign of the CLIP OS 4's O_MAYEXEC: > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch > This patch has been used for more than a decade with customized script > interpreters. Some examples can be found here: > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC > > Cc: Al Viro <viro@zeniv.linux.org.uk> > Cc: Christian Brauner <brauner@kernel.org> > Cc: Kees Cook <keescook@chromium.org> > Cc: Paul Moore <paul@paul-moore.com> > Reviewed-by: Serge Hallyn <serge@hallyn.com> > Link: https://docs.python.org/3/library/io.html#io.open_code [1] > Signed-off-by: Mickaël Salaün <mic@digikod.net> > Link: https://lore.kernel.org/r/20241112191858.162021-2-mic@digikod.net > --- > > Changes since v20: > * Rename AT_CHECK to AT_EXECVE_CHECK, requested by Amir Goldstein and > Serge Hallyn. > * Move the UAPI documentation to a dedicated RST file. > * Add Reviewed-by: Serge Hallyn > > Changes since v19: > * Remove mention of "role transition" as suggested by Andy. > * Highlight the difference between security_bprm_creds_for_exec() and > the __FMODE_EXEC check for LSMs (in commit message and LSM's hooks) as > discussed with Jeff. > * Improve documentation both in UAPI comments and kernel comments > (requested by Kees). > > New design since v18: > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net > --- > Documentation/userspace-api/check_exec.rst | 34 ++++++++++++++++++++++ > Documentation/userspace-api/index.rst | 1 + > fs/exec.c | 20 +++++++++++-- > include/linux/binfmts.h | 7 ++++- > include/uapi/linux/fcntl.h | 4 +++ > kernel/audit.h | 1 + > kernel/auditsc.c | 1 + > security/security.c | 10 +++++++ > 8 files changed, 75 insertions(+), 3 deletions(-) > create mode 100644 Documentation/userspace-api/check_exec.rst > > diff --git a/Documentation/userspace-api/check_exec.rst b/Documentation/userspace-api/check_exec.rst > new file mode 100644 > index 000000000000..ad1aeaa5f6c0 > --- /dev/null > +++ b/Documentation/userspace-api/check_exec.rst > @@ -0,0 +1,34 @@ > +=================== > +Executability check > +=================== > + > +AT_EXECVE_CHECK > +=============== > + > +Passing the ``AT_EXECVE_CHECK`` flag to :manpage:`execveat(2)` only performs a > +check on a regular file and returns 0 if execution of this file would be > +allowed, ignoring the file format and then the related interpreter dependencies > +(e.g. ELF libraries, script's shebang). > + > +Programs should always perform this check to apply kernel-level checks against > +files that are not directly executed by the kernel but passed to a user space > +interpreter instead. All files that contain executable code, from the point of > +view of the interpreter, should be checked. However the result of this check > +should only be enforced according to ``SECBIT_EXEC_RESTRICT_FILE`` or > +``SECBIT_EXEC_DENY_INTERACTIVE.``. Regarding "should only" Userspace (e.g. libc) could decide to enforce even when SECBIT_EXEC_RESTRICT_FILE=0), i.e. if it determines not-enforcing doesn't make sense. When SECBIT_EXEC_RESTRICT_FILE=1, userspace is bound to enforce. > + > +The main purpose of this flag is to improve the security and consistency of an > +execution environment to ensure that direct file execution (e.g. > +``./script.sh``) and indirect file execution (e.g. ``sh script.sh``) lead to > +the same result. For instance, this can be used to check if a file is > +trustworthy according to the caller's environment. > + > +In a secure environment, libraries and any executable dependencies should also > +be checked. For instance, dynamic linking should make sure that all libraries > +are allowed for execution to avoid trivial bypass (e.g. using ``LD_PRELOAD``). > +For such secure execution environment to make sense, only trusted code should > +be executable, which also requires integrity guarantees. > + > +To avoid race conditions leading to time-of-check to time-of-use issues, > +``AT_EXECVE_CHECK`` should be used with ``AT_EMPTY_PATH`` to check against a > +file descriptor instead of a path. > diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst > index 274cc7546efc..6272bcf11296 100644 > --- a/Documentation/userspace-api/index.rst > +++ b/Documentation/userspace-api/index.rst > @@ -35,6 +35,7 @@ Security-related interfaces > mfd_noexec > spec_ctrl > tee > + check_exec > > Devices and I/O > =============== > diff --git a/fs/exec.c b/fs/exec.c > index 6c53920795c2..bb83b6a39530 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -891,7 +891,8 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags) > .lookup_flags = LOOKUP_FOLLOW, > }; > > - if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0) > + if ((flags & > + ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_EXECVE_CHECK)) != 0) > return ERR_PTR(-EINVAL); > if (flags & AT_SYMLINK_NOFOLLOW) > open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW; > @@ -1545,6 +1546,21 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl > } > bprm->interp = bprm->filename; > > + /* > + * At this point, security_file_open() has already been called (with > + * __FMODE_EXEC) and access control checks for AT_EXECVE_CHECK will > + * stop just after the security_bprm_creds_for_exec() call in > + * bprm_execve(). Indeed, the kernel should not try to parse the > + * content of the file with exec_binprm() nor change the calling > + * thread, which means that the following security functions will be > + * not called: > + * - security_bprm_check() > + * - security_bprm_creds_from_file() > + * - security_bprm_committing_creds() > + * - security_bprm_committed_creds() > + */ > + bprm->is_check = !!(flags & AT_EXECVE_CHECK); > + > retval = bprm_mm_init(bprm); > if (!retval) > return bprm; > @@ -1839,7 +1855,7 @@ static int bprm_execve(struct linux_binprm *bprm) > > /* Set the unchanging part of bprm->cred */ > retval = security_bprm_creds_for_exec(bprm); > - if (retval) > + if (retval || bprm->is_check) > goto out; > > retval = exec_binprm(bprm); > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > index e6c00e860951..8ff0eb3644a1 100644 > --- a/include/linux/binfmts.h > +++ b/include/linux/binfmts.h > @@ -42,7 +42,12 @@ struct linux_binprm { > * Set when errors can no longer be returned to the > * original userspace. > */ > - point_of_no_return:1; > + point_of_no_return:1, > + /* > + * Set by user space to check executability according to the > + * caller's environment. > + */ > + is_check:1; > struct file *executable; /* Executable to pass to the interpreter */ > struct file *interpreter; > struct file *file; > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h > index 87e2dec79fea..2e87f2e3a79f 100644 > --- a/include/uapi/linux/fcntl.h > +++ b/include/uapi/linux/fcntl.h > @@ -154,6 +154,10 @@ > usable with open_by_handle_at(2). */ > #define AT_HANDLE_MNT_ID_UNIQUE 0x001 /* Return the u64 unique mount ID. */ > > +/* Flags for execveat2(2). */ > +#define AT_EXECVE_CHECK 0x10000 /* Only perform a check if execution > + would be allowed. */ > + > #if defined(__KERNEL__) > #define AT_GETATTR_NOSEC 0x80000000 > #endif > diff --git a/kernel/audit.h b/kernel/audit.h > index a60d2840559e..8ebdabd2ab81 100644 > --- a/kernel/audit.h > +++ b/kernel/audit.h > @@ -197,6 +197,7 @@ struct audit_context { > struct open_how openat2; > struct { > int argc; > + bool is_check; > } execve; > struct { > char *name; > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > index cd57053b4a69..8d9ba5600cf2 100644 > --- a/kernel/auditsc.c > +++ b/kernel/auditsc.c > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm) > > context->type = AUDIT_EXECVE; > context->execve.argc = bprm->argc; > + context->execve.is_check = bprm->is_check; Where is execve.is_check used ? > } > > > diff --git a/security/security.c b/security/security.c > index c5981e558bc2..456361ec249d 100644 > --- a/security/security.c > +++ b/security/security.c > @@ -1249,6 +1249,12 @@ int security_vm_enough_memory_mm(struct mm_struct *mm, long pages) > * to 1 if AT_SECURE should be set to request libc enable secure mode. @bprm > * contains the linux_binprm structure. > * > + * If execveat(2) is called with the AT_EXECVE_CHECK flag, bprm->is_check is > + * set. The result must be the same as without this flag even if the execution > + * will never really happen and @bprm will always be dropped. > + * > + * This hook must not change current->cred, only @bprm->cred. > + * > * Return: Returns 0 if the hook is successful and permission is granted. > */ > int security_bprm_creds_for_exec(struct linux_binprm *bprm) > @@ -3100,6 +3106,10 @@ int security_file_receive(struct file *file) > * Save open-time permission checking state for later use upon file_permission, > * and recheck access if anything has changed since inode_permission. > * > + * We can check if a file is opened for execution (e.g. execve(2) call), either > + * directly or indirectly (e.g. ELF's ld.so) by checking file->f_flags & > + * __FMODE_EXEC . > + * > * Return: Returns 0 if permission is granted. > */ > int security_file_open(struct file *file) > -- > 2.47.0 > >
On Tue, Nov 19, 2024 at 05:17:00PM -0800, Jeff Xu wrote: > On Tue, Nov 12, 2024 at 11:22 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > Add a new AT_EXECVE_CHECK flag to execveat(2) to check if a file would > > be allowed for execution. The main use case is for script interpreters > > and dynamic linkers to check execution permission according to the > > kernel's security policy. Another use case is to add context to access > > logs e.g., which script (instead of interpreter) accessed a file. As > > any executable code, scripts could also use this check [1]. > > > > This is different from faccessat(2) + X_OK which only checks a subset of > > access rights (i.e. inode permission and mount options for regular > > files), but not the full context (e.g. all LSM access checks). The main > > use case for access(2) is for SUID processes to (partially) check access > > on behalf of their caller. The main use case for execveat(2) + > > AT_EXECVE_CHECK is to check if a script execution would be allowed, > > according to all the different restrictions in place. Because the use > > of AT_EXECVE_CHECK follows the exact kernel semantic as for a real > > execution, user space gets the same error codes. > > > > An interesting point of using execveat(2) instead of openat2(2) is that > > it decouples the check from the enforcement. Indeed, the security check > > can be logged (e.g. with audit) without blocking an execution > > environment not yet ready to enforce a strict security policy. > > > > LSMs can control or log execution requests with > > security_bprm_creds_for_exec(). However, to enforce a consistent and > > complete access control (e.g. on binary's dependencies) LSMs should > > restrict file executability, or mesure executed files, with > > security_file_open() by checking file->f_flags & __FMODE_EXEC. > > > > Because AT_EXECVE_CHECK is dedicated to user space interpreters, it > > doesn't make sense for the kernel to parse the checked files, look for > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC > > if the format is unknown. Because of that, security_bprm_check() is > > never called when AT_EXECVE_CHECK is used. > > > > It should be noted that script interpreters cannot directly use > > execveat(2) (without this new AT_EXECVE_CHECK flag) because this could > > lead to unexpected behaviors e.g., `python script.sh` could lead to Bash > > being executed to interpret the script. Unlike the kernel, script > > interpreters may just interpret the shebang as a simple comment, which > > should not change for backward compatibility reasons. > > > > Because scripts or libraries files might not currently have the > > executable permission set, or because we might want specific users to be > > allowed to run arbitrary scripts, the following patch provides a dynamic > > configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and > > SECBIT_EXEC_DENY_INTERACTIVE securebits. > > > > This is a redesign of the CLIP OS 4's O_MAYEXEC: > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch > > This patch has been used for more than a decade with customized script > > interpreters. Some examples can be found here: > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC > > > > Cc: Al Viro <viro@zeniv.linux.org.uk> > > Cc: Christian Brauner <brauner@kernel.org> > > Cc: Kees Cook <keescook@chromium.org> > > Cc: Paul Moore <paul@paul-moore.com> > > Reviewed-by: Serge Hallyn <serge@hallyn.com> > > Link: https://docs.python.org/3/library/io.html#io.open_code [1] > > Signed-off-by: Mickaël Salaün <mic@digikod.net> > > Link: https://lore.kernel.org/r/20241112191858.162021-2-mic@digikod.net > > --- > > > > Changes since v20: > > * Rename AT_CHECK to AT_EXECVE_CHECK, requested by Amir Goldstein and > > Serge Hallyn. > > * Move the UAPI documentation to a dedicated RST file. > > * Add Reviewed-by: Serge Hallyn > > > > Changes since v19: > > * Remove mention of "role transition" as suggested by Andy. > > * Highlight the difference between security_bprm_creds_for_exec() and > > the __FMODE_EXEC check for LSMs (in commit message and LSM's hooks) as > > discussed with Jeff. > > * Improve documentation both in UAPI comments and kernel comments > > (requested by Kees). > > > > New design since v18: > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net > > --- > > Documentation/userspace-api/check_exec.rst | 34 ++++++++++++++++++++++ > > Documentation/userspace-api/index.rst | 1 + > > fs/exec.c | 20 +++++++++++-- > > include/linux/binfmts.h | 7 ++++- > > include/uapi/linux/fcntl.h | 4 +++ > > kernel/audit.h | 1 + > > kernel/auditsc.c | 1 + > > security/security.c | 10 +++++++ > > 8 files changed, 75 insertions(+), 3 deletions(-) > > create mode 100644 Documentation/userspace-api/check_exec.rst > > > > diff --git a/Documentation/userspace-api/check_exec.rst b/Documentation/userspace-api/check_exec.rst > > new file mode 100644 > > index 000000000000..ad1aeaa5f6c0 > > --- /dev/null > > +++ b/Documentation/userspace-api/check_exec.rst > > @@ -0,0 +1,34 @@ > > +=================== > > +Executability check > > +=================== > > + > > +AT_EXECVE_CHECK > > +=============== > > + > > +Passing the ``AT_EXECVE_CHECK`` flag to :manpage:`execveat(2)` only performs a > > +check on a regular file and returns 0 if execution of this file would be > > +allowed, ignoring the file format and then the related interpreter dependencies > > +(e.g. ELF libraries, script's shebang). > > + > > +Programs should always perform this check to apply kernel-level checks against > > +files that are not directly executed by the kernel but passed to a user space > > +interpreter instead. All files that contain executable code, from the point of > > +view of the interpreter, should be checked. However the result of this check > > +should only be enforced according to ``SECBIT_EXEC_RESTRICT_FILE`` or > > +``SECBIT_EXEC_DENY_INTERACTIVE.``. > Regarding "should only" > Userspace (e.g. libc) could decide to enforce even when > SECBIT_EXEC_RESTRICT_FILE=0), i.e. if it determines not-enforcing > doesn't make sense. User space is always in control, but I don't think it would be wise to not follow the configuration securebits (in a generic system) because this could result to unattended behaviors (I don't have a specific one in mind but...). That being said, configuration and checks are standalones and specific/tailored systems are free to do the checks they want. > When SECBIT_EXEC_RESTRICT_FILE=1, userspace is bound to enforce. > > > + > > +The main purpose of this flag is to improve the security and consistency of an > > +execution environment to ensure that direct file execution (e.g. > > +``./script.sh``) and indirect file execution (e.g. ``sh script.sh``) lead to > > +the same result. For instance, this can be used to check if a file is > > +trustworthy according to the caller's environment. > > + > > +In a secure environment, libraries and any executable dependencies should also > > +be checked. For instance, dynamic linking should make sure that all libraries > > +are allowed for execution to avoid trivial bypass (e.g. using ``LD_PRELOAD``). > > +For such secure execution environment to make sense, only trusted code should > > +be executable, which also requires integrity guarantees. > > + > > +To avoid race conditions leading to time-of-check to time-of-use issues, > > +``AT_EXECVE_CHECK`` should be used with ``AT_EMPTY_PATH`` to check against a > > +file descriptor instead of a path. > > diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst > > index 274cc7546efc..6272bcf11296 100644 > > --- a/Documentation/userspace-api/index.rst > > +++ b/Documentation/userspace-api/index.rst > > @@ -35,6 +35,7 @@ Security-related interfaces > > mfd_noexec > > spec_ctrl > > tee > > + check_exec > > > > Devices and I/O > > =============== > > diff --git a/fs/exec.c b/fs/exec.c > > index 6c53920795c2..bb83b6a39530 100644 > > --- a/fs/exec.c > > +++ b/fs/exec.c > > @@ -891,7 +891,8 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags) > > .lookup_flags = LOOKUP_FOLLOW, > > }; > > > > - if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0) > > + if ((flags & > > + ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_EXECVE_CHECK)) != 0) > > return ERR_PTR(-EINVAL); > > if (flags & AT_SYMLINK_NOFOLLOW) > > open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW; > > @@ -1545,6 +1546,21 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl > > } > > bprm->interp = bprm->filename; > > > > + /* > > + * At this point, security_file_open() has already been called (with > > + * __FMODE_EXEC) and access control checks for AT_EXECVE_CHECK will > > + * stop just after the security_bprm_creds_for_exec() call in > > + * bprm_execve(). Indeed, the kernel should not try to parse the > > + * content of the file with exec_binprm() nor change the calling > > + * thread, which means that the following security functions will be > > + * not called: > > + * - security_bprm_check() > > + * - security_bprm_creds_from_file() > > + * - security_bprm_committing_creds() > > + * - security_bprm_committed_creds() > > + */ > > + bprm->is_check = !!(flags & AT_EXECVE_CHECK); > > + > > retval = bprm_mm_init(bprm); > > if (!retval) > > return bprm; > > @@ -1839,7 +1855,7 @@ static int bprm_execve(struct linux_binprm *bprm) > > > > /* Set the unchanging part of bprm->cred */ > > retval = security_bprm_creds_for_exec(bprm); > > - if (retval) > > + if (retval || bprm->is_check) > > goto out; > > > > retval = exec_binprm(bprm); > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > > index e6c00e860951..8ff0eb3644a1 100644 > > --- a/include/linux/binfmts.h > > +++ b/include/linux/binfmts.h > > @@ -42,7 +42,12 @@ struct linux_binprm { > > * Set when errors can no longer be returned to the > > * original userspace. > > */ > > - point_of_no_return:1; > > + point_of_no_return:1, > > + /* > > + * Set by user space to check executability according to the > > + * caller's environment. > > + */ > > + is_check:1; > > struct file *executable; /* Executable to pass to the interpreter */ > > struct file *interpreter; > > struct file *file; > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h > > index 87e2dec79fea..2e87f2e3a79f 100644 > > --- a/include/uapi/linux/fcntl.h > > +++ b/include/uapi/linux/fcntl.h > > @@ -154,6 +154,10 @@ > > usable with open_by_handle_at(2). */ > > #define AT_HANDLE_MNT_ID_UNIQUE 0x001 /* Return the u64 unique mount ID. */ > > > > +/* Flags for execveat2(2). */ > > +#define AT_EXECVE_CHECK 0x10000 /* Only perform a check if execution > > + would be allowed. */ > > + > > #if defined(__KERNEL__) > > #define AT_GETATTR_NOSEC 0x80000000 > > #endif > > diff --git a/kernel/audit.h b/kernel/audit.h > > index a60d2840559e..8ebdabd2ab81 100644 > > --- a/kernel/audit.h > > +++ b/kernel/audit.h > > @@ -197,6 +197,7 @@ struct audit_context { > > struct open_how openat2; > > struct { > > int argc; > > + bool is_check; > > } execve; > > struct { > > char *name; > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > > index cd57053b4a69..8d9ba5600cf2 100644 > > --- a/kernel/auditsc.c > > +++ b/kernel/auditsc.c > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm) > > > > context->type = AUDIT_EXECVE; > > context->execve.argc = bprm->argc; > > + context->execve.is_check = bprm->is_check; > Where is execve.is_check used ? It is used in bprm_execve(), exposed to the audit framework, and potentially used by LSMs. > > > > } > > > > > > diff --git a/security/security.c b/security/security.c > > index c5981e558bc2..456361ec249d 100644 > > --- a/security/security.c > > +++ b/security/security.c > > @@ -1249,6 +1249,12 @@ int security_vm_enough_memory_mm(struct mm_struct *mm, long pages) > > * to 1 if AT_SECURE should be set to request libc enable secure mode. @bprm > > * contains the linux_binprm structure. > > * > > + * If execveat(2) is called with the AT_EXECVE_CHECK flag, bprm->is_check is > > + * set. The result must be the same as without this flag even if the execution > > + * will never really happen and @bprm will always be dropped. > > + * > > + * This hook must not change current->cred, only @bprm->cred. > > + * > > * Return: Returns 0 if the hook is successful and permission is granted. > > */ > > int security_bprm_creds_for_exec(struct linux_binprm *bprm) > > @@ -3100,6 +3106,10 @@ int security_file_receive(struct file *file) > > * Save open-time permission checking state for later use upon file_permission, > > * and recheck access if anything has changed since inode_permission. > > * > > + * We can check if a file is opened for execution (e.g. execve(2) call), either > > + * directly or indirectly (e.g. ELF's ld.so) by checking file->f_flags & > > + * __FMODE_EXEC . > > + * > > * Return: Returns 0 if permission is granted. > > */ > > int security_file_open(struct file *file) > > -- > > 2.47.0 > > > >
On Wed, Nov 20, 2024 at 1:42 AM Mickaël Salaün <mic@digikod.net> wrote: > > On Tue, Nov 19, 2024 at 05:17:00PM -0800, Jeff Xu wrote: > > On Tue, Nov 12, 2024 at 11:22 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > Add a new AT_EXECVE_CHECK flag to execveat(2) to check if a file would > > > be allowed for execution. The main use case is for script interpreters > > > and dynamic linkers to check execution permission according to the > > > kernel's security policy. Another use case is to add context to access > > > logs e.g., which script (instead of interpreter) accessed a file. As > > > any executable code, scripts could also use this check [1]. > > > > > > This is different from faccessat(2) + X_OK which only checks a subset of > > > access rights (i.e. inode permission and mount options for regular > > > files), but not the full context (e.g. all LSM access checks). The main > > > use case for access(2) is for SUID processes to (partially) check access > > > on behalf of their caller. The main use case for execveat(2) + > > > AT_EXECVE_CHECK is to check if a script execution would be allowed, > > > according to all the different restrictions in place. Because the use > > > of AT_EXECVE_CHECK follows the exact kernel semantic as for a real > > > execution, user space gets the same error codes. > > > > > > An interesting point of using execveat(2) instead of openat2(2) is that > > > it decouples the check from the enforcement. Indeed, the security check > > > can be logged (e.g. with audit) without blocking an execution > > > environment not yet ready to enforce a strict security policy. > > > > > > LSMs can control or log execution requests with > > > security_bprm_creds_for_exec(). However, to enforce a consistent and > > > complete access control (e.g. on binary's dependencies) LSMs should > > > restrict file executability, or mesure executed files, with > > > security_file_open() by checking file->f_flags & __FMODE_EXEC. > > > > > > Because AT_EXECVE_CHECK is dedicated to user space interpreters, it > > > doesn't make sense for the kernel to parse the checked files, look for > > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC > > > if the format is unknown. Because of that, security_bprm_check() is > > > never called when AT_EXECVE_CHECK is used. > > > > > > It should be noted that script interpreters cannot directly use > > > execveat(2) (without this new AT_EXECVE_CHECK flag) because this could > > > lead to unexpected behaviors e.g., `python script.sh` could lead to Bash > > > being executed to interpret the script. Unlike the kernel, script > > > interpreters may just interpret the shebang as a simple comment, which > > > should not change for backward compatibility reasons. > > > > > > Because scripts or libraries files might not currently have the > > > executable permission set, or because we might want specific users to be > > > allowed to run arbitrary scripts, the following patch provides a dynamic > > > configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and > > > SECBIT_EXEC_DENY_INTERACTIVE securebits. > > > > > > This is a redesign of the CLIP OS 4's O_MAYEXEC: > > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch > > > This patch has been used for more than a decade with customized script > > > interpreters. Some examples can be found here: > > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC > > > > > > Cc: Al Viro <viro@zeniv.linux.org.uk> > > > Cc: Christian Brauner <brauner@kernel.org> > > > Cc: Kees Cook <keescook@chromium.org> > > > Cc: Paul Moore <paul@paul-moore.com> > > > Reviewed-by: Serge Hallyn <serge@hallyn.com> > > > Link: https://docs.python.org/3/library/io.html#io.open_code [1] > > > Signed-off-by: Mickaël Salaün <mic@digikod.net> > > > Link: https://lore.kernel.org/r/20241112191858.162021-2-mic@digikod.net > > > --- > > > > > > Changes since v20: > > > * Rename AT_CHECK to AT_EXECVE_CHECK, requested by Amir Goldstein and > > > Serge Hallyn. > > > * Move the UAPI documentation to a dedicated RST file. > > > * Add Reviewed-by: Serge Hallyn > > > > > > Changes since v19: > > > * Remove mention of "role transition" as suggested by Andy. > > > * Highlight the difference between security_bprm_creds_for_exec() and > > > the __FMODE_EXEC check for LSMs (in commit message and LSM's hooks) as > > > discussed with Jeff. > > > * Improve documentation both in UAPI comments and kernel comments > > > (requested by Kees). > > > > > > New design since v18: > > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net > > > --- > > > Documentation/userspace-api/check_exec.rst | 34 ++++++++++++++++++++++ > > > Documentation/userspace-api/index.rst | 1 + > > > fs/exec.c | 20 +++++++++++-- > > > include/linux/binfmts.h | 7 ++++- > > > include/uapi/linux/fcntl.h | 4 +++ > > > kernel/audit.h | 1 + > > > kernel/auditsc.c | 1 + > > > security/security.c | 10 +++++++ > > > 8 files changed, 75 insertions(+), 3 deletions(-) > > > create mode 100644 Documentation/userspace-api/check_exec.rst > > > > > > diff --git a/Documentation/userspace-api/check_exec.rst b/Documentation/userspace-api/check_exec.rst > > > new file mode 100644 > > > index 000000000000..ad1aeaa5f6c0 > > > --- /dev/null > > > +++ b/Documentation/userspace-api/check_exec.rst > > > @@ -0,0 +1,34 @@ > > > +=================== > > > +Executability check > > > +=================== > > > + > > > +AT_EXECVE_CHECK > > > +=============== > > > + > > > +Passing the ``AT_EXECVE_CHECK`` flag to :manpage:`execveat(2)` only performs a > > > +check on a regular file and returns 0 if execution of this file would be > > > +allowed, ignoring the file format and then the related interpreter dependencies > > > +(e.g. ELF libraries, script's shebang). > > > + > > > +Programs should always perform this check to apply kernel-level checks against > > > +files that are not directly executed by the kernel but passed to a user space > > > +interpreter instead. All files that contain executable code, from the point of > > > +view of the interpreter, should be checked. However the result of this check > > > +should only be enforced according to ``SECBIT_EXEC_RESTRICT_FILE`` or > > > +``SECBIT_EXEC_DENY_INTERACTIVE.``. > > Regarding "should only" > > Userspace (e.g. libc) could decide to enforce even when > > SECBIT_EXEC_RESTRICT_FILE=0), i.e. if it determines not-enforcing > > doesn't make sense. > > User space is always in control, but I don't think it would be wise to > not follow the configuration securebits (in a generic system) because > this could result to unattended behaviors (I don't have a specific one > in mind but...). That being said, configuration and checks are > standalones and specific/tailored systems are free to do the checks they > want. > In the case of dynamic linker, we can always enforce honoring the execveat(AT_EXECVE_CHECK) result, right ? I can't think of a case not to, the dynamic linker doesn't need to check the SECBIT_EXEC_RESTRICT_FILE bit. script interpreters need to check this though, because the apps might need to adjust/test the scripts they are calling, so SECBIT_EXEC_RESTRICT_FILE can be used to opt-out the enforcement. > > When SECBIT_EXEC_RESTRICT_FILE=1, userspace is bound to enforce. > > > > > + > > > +The main purpose of this flag is to improve the security and consistency of an > > > +execution environment to ensure that direct file execution (e.g. > > > +``./script.sh``) and indirect file execution (e.g. ``sh script.sh``) lead to > > > +the same result. For instance, this can be used to check if a file is > > > +trustworthy according to the caller's environment. > > > + > > > +In a secure environment, libraries and any executable dependencies should also > > > +be checked. For instance, dynamic linking should make sure that all libraries > > > +are allowed for execution to avoid trivial bypass (e.g. using ``LD_PRELOAD``). > > > +For such secure execution environment to make sense, only trusted code should > > > +be executable, which also requires integrity guarantees. > > > + > > > +To avoid race conditions leading to time-of-check to time-of-use issues, > > > +``AT_EXECVE_CHECK`` should be used with ``AT_EMPTY_PATH`` to check against a > > > +file descriptor instead of a path. > > > diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst > > > index 274cc7546efc..6272bcf11296 100644 > > > --- a/Documentation/userspace-api/index.rst > > > +++ b/Documentation/userspace-api/index.rst > > > @@ -35,6 +35,7 @@ Security-related interfaces > > > mfd_noexec > > > spec_ctrl > > > tee > > > + check_exec > > > > > > Devices and I/O > > > =============== > > > diff --git a/fs/exec.c b/fs/exec.c > > > index 6c53920795c2..bb83b6a39530 100644 > > > --- a/fs/exec.c > > > +++ b/fs/exec.c > > > @@ -891,7 +891,8 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags) > > > .lookup_flags = LOOKUP_FOLLOW, > > > }; > > > > > > - if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0) > > > + if ((flags & > > > + ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_EXECVE_CHECK)) != 0) > > > return ERR_PTR(-EINVAL); > > > if (flags & AT_SYMLINK_NOFOLLOW) > > > open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW; > > > @@ -1545,6 +1546,21 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl > > > } > > > bprm->interp = bprm->filename; > > > > > > + /* > > > + * At this point, security_file_open() has already been called (with > > > + * __FMODE_EXEC) and access control checks for AT_EXECVE_CHECK will > > > + * stop just after the security_bprm_creds_for_exec() call in > > > + * bprm_execve(). Indeed, the kernel should not try to parse the > > > + * content of the file with exec_binprm() nor change the calling > > > + * thread, which means that the following security functions will be > > > + * not called: > > > + * - security_bprm_check() > > > + * - security_bprm_creds_from_file() > > > + * - security_bprm_committing_creds() > > > + * - security_bprm_committed_creds() > > > + */ > > > + bprm->is_check = !!(flags & AT_EXECVE_CHECK); > > > + > > > retval = bprm_mm_init(bprm); > > > if (!retval) > > > return bprm; > > > @@ -1839,7 +1855,7 @@ static int bprm_execve(struct linux_binprm *bprm) > > > > > > /* Set the unchanging part of bprm->cred */ > > > retval = security_bprm_creds_for_exec(bprm); > > > - if (retval) > > > + if (retval || bprm->is_check) > > > goto out; > > > > > > retval = exec_binprm(bprm); > > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > > > index e6c00e860951..8ff0eb3644a1 100644 > > > --- a/include/linux/binfmts.h > > > +++ b/include/linux/binfmts.h > > > @@ -42,7 +42,12 @@ struct linux_binprm { > > > * Set when errors can no longer be returned to the > > > * original userspace. > > > */ > > > - point_of_no_return:1; > > > + point_of_no_return:1, > > > + /* > > > + * Set by user space to check executability according to the > > > + * caller's environment. > > > + */ > > > + is_check:1; > > > struct file *executable; /* Executable to pass to the interpreter */ > > > struct file *interpreter; > > > struct file *file; > > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h > > > index 87e2dec79fea..2e87f2e3a79f 100644 > > > --- a/include/uapi/linux/fcntl.h > > > +++ b/include/uapi/linux/fcntl.h > > > @@ -154,6 +154,10 @@ > > > usable with open_by_handle_at(2). */ > > > #define AT_HANDLE_MNT_ID_UNIQUE 0x001 /* Return the u64 unique mount ID. */ > > > > > > +/* Flags for execveat2(2). */ > > > +#define AT_EXECVE_CHECK 0x10000 /* Only perform a check if execution > > > + would be allowed. */ > > > + > > > #if defined(__KERNEL__) > > > #define AT_GETATTR_NOSEC 0x80000000 > > > #endif > > > diff --git a/kernel/audit.h b/kernel/audit.h > > > index a60d2840559e..8ebdabd2ab81 100644 > > > --- a/kernel/audit.h > > > +++ b/kernel/audit.h > > > @@ -197,6 +197,7 @@ struct audit_context { > > > struct open_how openat2; > > > struct { > > > int argc; > > > + bool is_check; > > > } execve; > > > struct { > > > char *name; > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > > > index cd57053b4a69..8d9ba5600cf2 100644 > > > --- a/kernel/auditsc.c > > > +++ b/kernel/auditsc.c > > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm) > > > > > > context->type = AUDIT_EXECVE; > > > context->execve.argc = bprm->argc; > > > + context->execve.is_check = bprm->is_check; > > Where is execve.is_check used ? > > It is used in bprm_execve(), exposed to the audit framework, and > potentially used by LSMs. > bprm_execve() uses bprm->is_check, not the context->execve.is_check. > > > > > > > } > > > > > > > > > diff --git a/security/security.c b/security/security.c > > > index c5981e558bc2..456361ec249d 100644 > > > --- a/security/security.c > > > +++ b/security/security.c > > > @@ -1249,6 +1249,12 @@ int security_vm_enough_memory_mm(struct mm_struct *mm, long pages) > > > * to 1 if AT_SECURE should be set to request libc enable secure mode. @bprm > > > * contains the linux_binprm structure. > > > * > > > + * If execveat(2) is called with the AT_EXECVE_CHECK flag, bprm->is_check is > > > + * set. The result must be the same as without this flag even if the execution > > > + * will never really happen and @bprm will always be dropped. > > > + * > > > + * This hook must not change current->cred, only @bprm->cred. > > > + * > > > * Return: Returns 0 if the hook is successful and permission is granted. > > > */ > > > int security_bprm_creds_for_exec(struct linux_binprm *bprm) > > > @@ -3100,6 +3106,10 @@ int security_file_receive(struct file *file) > > > * Save open-time permission checking state for later use upon file_permission, > > > * and recheck access if anything has changed since inode_permission. > > > * > > > + * We can check if a file is opened for execution (e.g. execve(2) call), either > > > + * directly or indirectly (e.g. ELF's ld.so) by checking file->f_flags & > > > + * __FMODE_EXEC . > > > + * > > > * Return: Returns 0 if permission is granted. > > > */ > > > int security_file_open(struct file *file) > > > -- > > > 2.47.0 > > > > > >
diff --git a/Documentation/userspace-api/check_exec.rst b/Documentation/userspace-api/check_exec.rst new file mode 100644 index 000000000000..ad1aeaa5f6c0 --- /dev/null +++ b/Documentation/userspace-api/check_exec.rst @@ -0,0 +1,34 @@ +=================== +Executability check +=================== + +AT_EXECVE_CHECK +=============== + +Passing the ``AT_EXECVE_CHECK`` flag to :manpage:`execveat(2)` only performs a +check on a regular file and returns 0 if execution of this file would be +allowed, ignoring the file format and then the related interpreter dependencies +(e.g. ELF libraries, script's shebang). + +Programs should always perform this check to apply kernel-level checks against +files that are not directly executed by the kernel but passed to a user space +interpreter instead. All files that contain executable code, from the point of +view of the interpreter, should be checked. However the result of this check +should only be enforced according to ``SECBIT_EXEC_RESTRICT_FILE`` or +``SECBIT_EXEC_DENY_INTERACTIVE.``. + +The main purpose of this flag is to improve the security and consistency of an +execution environment to ensure that direct file execution (e.g. +``./script.sh``) and indirect file execution (e.g. ``sh script.sh``) lead to +the same result. For instance, this can be used to check if a file is +trustworthy according to the caller's environment. + +In a secure environment, libraries and any executable dependencies should also +be checked. For instance, dynamic linking should make sure that all libraries +are allowed for execution to avoid trivial bypass (e.g. using ``LD_PRELOAD``). +For such secure execution environment to make sense, only trusted code should +be executable, which also requires integrity guarantees. + +To avoid race conditions leading to time-of-check to time-of-use issues, +``AT_EXECVE_CHECK`` should be used with ``AT_EMPTY_PATH`` to check against a +file descriptor instead of a path. diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst index 274cc7546efc..6272bcf11296 100644 --- a/Documentation/userspace-api/index.rst +++ b/Documentation/userspace-api/index.rst @@ -35,6 +35,7 @@ Security-related interfaces mfd_noexec spec_ctrl tee + check_exec Devices and I/O =============== diff --git a/fs/exec.c b/fs/exec.c index 6c53920795c2..bb83b6a39530 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -891,7 +891,8 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags) .lookup_flags = LOOKUP_FOLLOW, }; - if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0) + if ((flags & + ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_EXECVE_CHECK)) != 0) return ERR_PTR(-EINVAL); if (flags & AT_SYMLINK_NOFOLLOW) open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW; @@ -1545,6 +1546,21 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl } bprm->interp = bprm->filename; + /* + * At this point, security_file_open() has already been called (with + * __FMODE_EXEC) and access control checks for AT_EXECVE_CHECK will + * stop just after the security_bprm_creds_for_exec() call in + * bprm_execve(). Indeed, the kernel should not try to parse the + * content of the file with exec_binprm() nor change the calling + * thread, which means that the following security functions will be + * not called: + * - security_bprm_check() + * - security_bprm_creds_from_file() + * - security_bprm_committing_creds() + * - security_bprm_committed_creds() + */ + bprm->is_check = !!(flags & AT_EXECVE_CHECK); + retval = bprm_mm_init(bprm); if (!retval) return bprm; @@ -1839,7 +1855,7 @@ static int bprm_execve(struct linux_binprm *bprm) /* Set the unchanging part of bprm->cred */ retval = security_bprm_creds_for_exec(bprm); - if (retval) + if (retval || bprm->is_check) goto out; retval = exec_binprm(bprm); diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index e6c00e860951..8ff0eb3644a1 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -42,7 +42,12 @@ struct linux_binprm { * Set when errors can no longer be returned to the * original userspace. */ - point_of_no_return:1; + point_of_no_return:1, + /* + * Set by user space to check executability according to the + * caller's environment. + */ + is_check:1; struct file *executable; /* Executable to pass to the interpreter */ struct file *interpreter; struct file *file; diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 87e2dec79fea..2e87f2e3a79f 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -154,6 +154,10 @@ usable with open_by_handle_at(2). */ #define AT_HANDLE_MNT_ID_UNIQUE 0x001 /* Return the u64 unique mount ID. */ +/* Flags for execveat2(2). */ +#define AT_EXECVE_CHECK 0x10000 /* Only perform a check if execution + would be allowed. */ + #if defined(__KERNEL__) #define AT_GETATTR_NOSEC 0x80000000 #endif diff --git a/kernel/audit.h b/kernel/audit.h index a60d2840559e..8ebdabd2ab81 100644 --- a/kernel/audit.h +++ b/kernel/audit.h @@ -197,6 +197,7 @@ struct audit_context { struct open_how openat2; struct { int argc; + bool is_check; } execve; struct { char *name; diff --git a/kernel/auditsc.c b/kernel/auditsc.c index cd57053b4a69..8d9ba5600cf2 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm) context->type = AUDIT_EXECVE; context->execve.argc = bprm->argc; + context->execve.is_check = bprm->is_check; } diff --git a/security/security.c b/security/security.c index c5981e558bc2..456361ec249d 100644 --- a/security/security.c +++ b/security/security.c @@ -1249,6 +1249,12 @@ int security_vm_enough_memory_mm(struct mm_struct *mm, long pages) * to 1 if AT_SECURE should be set to request libc enable secure mode. @bprm * contains the linux_binprm structure. * + * If execveat(2) is called with the AT_EXECVE_CHECK flag, bprm->is_check is + * set. The result must be the same as without this flag even if the execution + * will never really happen and @bprm will always be dropped. + * + * This hook must not change current->cred, only @bprm->cred. + * * Return: Returns 0 if the hook is successful and permission is granted. */ int security_bprm_creds_for_exec(struct linux_binprm *bprm) @@ -3100,6 +3106,10 @@ int security_file_receive(struct file *file) * Save open-time permission checking state for later use upon file_permission, * and recheck access if anything has changed since inode_permission. * + * We can check if a file is opened for execution (e.g. execve(2) call), either + * directly or indirectly (e.g. ELF's ld.so) by checking file->f_flags & + * __FMODE_EXEC . + * * Return: Returns 0 if permission is granted. */ int security_file_open(struct file *file)