Message ID | 20240704190137.696169-2-mic@digikod.net (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Paul Moore |
Headers | show |
Series | Script execution control (was O_MAYEXEC) | expand |
On Thu, Jul 04, 2024 at 09:01:33PM +0200, Mickaël Salaün wrote: > Add a new AT_CHECK flag to execveat(2) to check if a file would be > allowed for execution. The main use case is for script interpreters and > dynamic linkers to check execution permission according to the kernel's > security policy. Another use case is to add context to access logs e.g., > which script (instead of interpreter) accessed a file. As any > executable code, scripts could also use this check [1]. > > This is different than faccessat(2) which only checks file access > rights, but not the full context e.g. mount point's noexec, stack limit, > and all potential LSM extra checks (e.g. argv, envp, credentials). > Since the use of AT_CHECK follows the exact kernel semantic as for a > real execution, user space gets the same error codes. Nice! I much prefer this method of going through the exec machinery so we always have a single code path for these kinds of checks. > Because AT_CHECK is dedicated to user space interpreters, it doesn't > make sense for the kernel to parse the checked files, look for > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC > if the format is unknown. Because of that, security_bprm_check() is > never called when AT_CHECK is used. I'd like some additional comments in the code that reminds us that access control checks have finished past a certain point. [...] > diff --git a/fs/exec.c b/fs/exec.c > index 40073142288f..ea2a1867afdc 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags) > .lookup_flags = LOOKUP_FOLLOW, > }; > > - if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0) > + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0) > return ERR_PTR(-EINVAL); > if (flags & AT_SYMLINK_NOFOLLOW) > open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW; [...] > + * To avoid race conditions leading to time-of-check to time-of-use issues, > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file > + * descriptor instead of a path. I want this enforced by the kernel. Let's not leave trivial ToCToU foot-guns around. i.e.: if ((flags & AT_CHECK) == AT_CHECK && (flags & AT_EMPTY_PATH) == 0) return ERR_PTR(-EBADF);
On Thu, Jul 04, 2024 at 05:04:03PM -0700, Kees Cook wrote: > On Thu, Jul 04, 2024 at 09:01:33PM +0200, Mickaël Salaün wrote: > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > allowed for execution. The main use case is for script interpreters and > > dynamic linkers to check execution permission according to the kernel's > > security policy. Another use case is to add context to access logs e.g., > > which script (instead of interpreter) accessed a file. As any > > executable code, scripts could also use this check [1]. > > > > This is different than faccessat(2) which only checks file access > > rights, but not the full context e.g. mount point's noexec, stack limit, > > and all potential LSM extra checks (e.g. argv, envp, credentials). > > Since the use of AT_CHECK follows the exact kernel semantic as for a > > real execution, user space gets the same error codes. > > Nice! I much prefer this method of going through the exec machinery so > we always have a single code path for these kinds of checks. > > > Because AT_CHECK is dedicated to user space interpreters, it doesn't > > make sense for the kernel to parse the checked files, look for > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC > > if the format is unknown. Because of that, security_bprm_check() is > > never called when AT_CHECK is used. > > I'd like some additional comments in the code that reminds us that > access control checks have finished past a certain point. Where in the code? Just before the bprm->is_check assignment? > > [...] > > diff --git a/fs/exec.c b/fs/exec.c > > index 40073142288f..ea2a1867afdc 100644 > > --- a/fs/exec.c > > +++ b/fs/exec.c > > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags) > > .lookup_flags = LOOKUP_FOLLOW, > > }; > > > > - if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0) > > + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0) > > return ERR_PTR(-EINVAL); > > if (flags & AT_SYMLINK_NOFOLLOW) > > open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW; > [...] > > + * To avoid race conditions leading to time-of-check to time-of-use issues, > > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file > > + * descriptor instead of a path. > > I want this enforced by the kernel. Let's not leave trivial ToCToU > foot-guns around. i.e.: > > if ((flags & AT_CHECK) == AT_CHECK && (flags & AT_EMPTY_PATH) == 0) > return ERR_PTR(-EBADF); There are valid use cases relying on pathnames. See Linus's comment: https://lore.kernel.org/r/CAHk-=whb=XuU=LGKnJWaa7LOYQz9VwHs8SLfgLbT5sf2VAbX1A@mail.gmail.com
* Mickaël Salaün: > Add a new AT_CHECK flag to execveat(2) to check if a file would be > allowed for execution. The main use case is for script interpreters and > dynamic linkers to check execution permission according to the kernel's > security policy. Another use case is to add context to access logs e.g., > which script (instead of interpreter) accessed a file. As any > executable code, scripts could also use this check [1]. Some distributions no longer set executable bits on most shared objects, which I assume would interfere with AT_CHECK probing for shared objects. Removing the executable bit is attractive because of a combination of two bugs: a binutils wart which until recently always set the entry point address in the ELF header to zero, and the kernel not checking for a zero entry point (maybe in combination with an absent program interpreter) and failing the execve with ELIBEXEC, instead of doing the execve and then faulting at virtual address zero. Removing the executable bit is currently the only way to avoid these confusing crashes, so I understand the temptation. Thanks, Florian
On Fri, Jul 5, 2024 at 3:03 AM Mickaël Salaün <mic@digikod.net> wrote: > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > allowed for execution. The main use case is for script interpreters and > dynamic linkers to check execution permission according to the kernel's > security policy. Another use case is to add context to access logs e.g., > which script (instead of interpreter) accessed a file. As any > executable code, scripts could also use this check [1]. > Can you give a worked-out example of how this is useful? I assume the idea is that a program could open a file, then pass the fd to execveat() to get the kernel's idea of whether it's permissible to execute it. And then the program would interpret the file, which is morally like executing it. And there would be a big warning in the manpage that passing a *path* is subject to a TOCTOU race. This type of usage will do the wrong thing if LSM policy intends to lock down the task if the task were to actually exec the file. I personally think this is a mis-design (let the program doing the exec-ing lock itself down, possibly by querying a policy, but having magic happen on exec seems likely to do the wrong thing more often that it does the wright thing), but that ship sailed a long time ago. So maybe what's actually needed is a rather different API: a way to check *and perform* the security transition for an exec without actually execing. This would need to be done NO_NEW_PRIVS style for reasons that are hopefully obvious, but it would permit: fd = open(some script); if (do_exec_transition_without_exec(fd) != 0) return; // don't actually do it // OK, we may have just lost privileges. But that's okay, because we meant to do that. // Make sure we've munmapped anything sensitive and erased any secrets from memory, // and then interpret the script! I think this would actually be straightforward to implement in the kernel -- one would need to make sure that all the relevant no_new_privs checks are looking in the right place (as the task might not actually have no_new_privs set, but LSM_UNSAFE_NO_NEW_PRIVS would still be set), but I don't see any reason this would be insurmountable, nor do I expect there would be any fundamental problems. > This is different than faccessat(2) which only checks file access > rights, but not the full context e.g. mount point's noexec, stack limit, > and all potential LSM extra checks (e.g. argv, envp, credentials). > Since the use of AT_CHECK follows the exact kernel semantic as for a > real execution, user space gets the same error codes. > > With the information that a script interpreter is about to interpret a > script, an LSM security policy can adjust caller's access rights or log > execution request as for native script execution (e.g. role transition). > This is possible thanks to the call to security_bprm_creds_for_exec(). > > Because LSMs may only change bprm's credentials, use of AT_CHECK with > current kernel code should not be a security issue (e.g. unexpected role > transition). LSMs willing to update the caller's credential could now > do so when bprm->is_check is set. Of course, such policy change should > be in line with the new user space code. > > Because AT_CHECK is dedicated to user space interpreters, it doesn't > make sense for the kernel to parse the checked files, look for > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC > if the format is unknown. Because of that, security_bprm_check() is > never called when AT_CHECK is used. > > It should be noted that script interpreters cannot directly use > execveat(2) (without this new AT_CHECK flag) because this could lead to > unexpected behaviors e.g., `python script.sh` could lead to Bash being > executed to interpret the script. Unlike the kernel, script > interpreters may just interpret the shebang as a simple comment, which > should not change for backward compatibility reasons. > > Because scripts or libraries files might not currently have the > executable permission set, or because we might want specific users to be > allowed to run arbitrary scripts, the following patch provides a dynamic > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and > SECBIT_SHOULD_EXEC_RESTRICT securebits. Can you explain what those bits do? And why they're useful? > > This is a redesign of the CLIP OS 4's O_MAYEXEC: > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch > This patch has been used for more than a decade with customized script > interpreters. Some examples can be found here: > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC This one at least returns an fd, so it looks less likely to get misused in a way that adds a TOCTOU race.
On Fri, Jul 05, 2024 at 08:03:14PM +0200, Florian Weimer wrote: > * Mickaël Salaün: > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > allowed for execution. The main use case is for script interpreters and > > dynamic linkers to check execution permission according to the kernel's > > security policy. Another use case is to add context to access logs e.g., > > which script (instead of interpreter) accessed a file. As any > > executable code, scripts could also use this check [1]. > > Some distributions no longer set executable bits on most shared objects, > which I assume would interfere with AT_CHECK probing for shared objects. A file without the execute permission is not considered as executable by the kernel. The AT_CHECK flag doesn't change this semantic. Please note that this is just a check, not a restriction. See the next patch for the optional policy enforcement. Anyway, we need to define the policy, and for Linux this is done with the file permission bits. So for systems willing to have a consistent execution policy, we need to rely on the same bits. > Removing the executable bit is attractive because of a combination of > two bugs: a binutils wart which until recently always set the entry > point address in the ELF header to zero, and the kernel not checking for > a zero entry point (maybe in combination with an absent program > interpreter) and failing the execve with ELIBEXEC, instead of doing the > execve and then faulting at virtual address zero. Removing the > executable bit is currently the only way to avoid these confusing > crashes, so I understand the temptation. Interesting. Can you please point to the bug report and the fix? I don't see any ELIBEXEC in the kernel. FYI, AT_CHECK doesn't check the content of the file (unlike a full execve call). Anyway, I think we should not design a new kernel interface to work around a current user space bug.
* Mickaël Salaün: > On Fri, Jul 05, 2024 at 08:03:14PM +0200, Florian Weimer wrote: >> * Mickaël Salaün: >> >> > Add a new AT_CHECK flag to execveat(2) to check if a file would be >> > allowed for execution. The main use case is for script interpreters and >> > dynamic linkers to check execution permission according to the kernel's >> > security policy. Another use case is to add context to access logs e.g., >> > which script (instead of interpreter) accessed a file. As any >> > executable code, scripts could also use this check [1]. >> >> Some distributions no longer set executable bits on most shared objects, >> which I assume would interfere with AT_CHECK probing for shared objects. > > A file without the execute permission is not considered as executable by > the kernel. The AT_CHECK flag doesn't change this semantic. Please > note that this is just a check, not a restriction. See the next patch > for the optional policy enforcement. > > Anyway, we need to define the policy, and for Linux this is done with > the file permission bits. So for systems willing to have a consistent > execution policy, we need to rely on the same bits. Yes, that makes complete sense. I just wanted to point out the odd interaction with the old binutils bug and the (sadly still current) kernel bug. >> Removing the executable bit is attractive because of a combination of >> two bugs: a binutils wart which until recently always set the entry >> point address in the ELF header to zero, and the kernel not checking for >> a zero entry point (maybe in combination with an absent program >> interpreter) and failing the execve with ELIBEXEC, instead of doing the >> execve and then faulting at virtual address zero. Removing the >> executable bit is currently the only way to avoid these confusing >> crashes, so I understand the temptation. > > Interesting. Can you please point to the bug report and the fix? I > don't see any ELIBEXEC in the kernel. The kernel hasn't been fixed yet. I do think this should be fixed, so that distributions can bring back the executable bit. Thanks, Florian
On Sat, Jul 06, 2024 at 04:52:42PM +0800, Andy Lutomirski wrote: > On Fri, Jul 5, 2024 at 3:03 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > allowed for execution. The main use case is for script interpreters and > > dynamic linkers to check execution permission according to the kernel's > > security policy. Another use case is to add context to access logs e.g., > > which script (instead of interpreter) accessed a file. As any > > executable code, scripts could also use this check [1]. > > > > Can you give a worked-out example of how this is useful? Which part? Please take a look at CLIP OS, chromeOS, and PEP 578 use cases and related code (see cover letter). > > I assume the idea is that a program could open a file, then pass the > fd to execveat() to get the kernel's idea of whether it's permissible > to execute it. And then the program would interpret the file, which > is morally like executing it. And there would be a big warning in the > manpage that passing a *path* is subject to a TOCTOU race. yes > > This type of usage will do the wrong thing if LSM policy intends to > lock down the task if the task were to actually exec the file. I Why? LSMs should currently only change the bprm's credentials not the current's credentials. If needed, we can extend the current patch series with LSM specific patches for them to check bprm->is_check. > personally think this is a mis-design (let the program doing the > exec-ing lock itself down, possibly by querying a policy, but having > magic happen on exec seems likely to do the wrong thing more often > that it does the wright thing), but that ship sailed a long time ago. The execveat+AT_CHECK is only a check that doesn't impact the caller. Maybe you're talking about process transition with future LSM changes? In this case, we could add another flag, but I'm convinced it would be confusing for users. Anyway, let LSMs experiment with that and we'll come up with a new flag if needed. The current approach is a good and useful piece to fill a gap in Linux access control systems. > > So maybe what's actually needed is a rather different API: a way to > check *and perform* the security transition for an exec without > actually execing. This would need to be done NO_NEW_PRIVS style for > reasons that are hopefully obvious, but it would permit: NO_NEW_PRIVS is not that obvious in this case because the restrictions are enforced by user space, not the kernel. NO_NEW_PRIVS makes sense to avoid kernel restrictions be requested by a malicious/unprivileged process to change the behavior of a (child) privileged/trusted process. We are not in this configuration here. The only change would be for ptrace, which is a good thing either way and should not harm SUID processes but avoid confused deputy attack for them too. If this is about an LSM changing the caller's credentials, then yes it might want to set additional flags, but that would be specific to their implementation, not part of this patch. > > fd = open(some script); > if (do_exec_transition_without_exec(fd) != 0) > return; // don't actually do it > > // OK, we may have just lost privileges. But that's okay, because we > meant to do that. > // Make sure we've munmapped anything sensitive and erased any secrets > from memory, > // and then interpret the script! > > I think this would actually be straightforward to implement in the > kernel -- one would need to make sure that all the relevant > no_new_privs checks are looking in the right place (as the task might > not actually have no_new_privs set, but LSM_UNSAFE_NO_NEW_PRIVS would > still be set), but I don't see any reason this would be > insurmountable, nor do I expect there would be any fundamental > problems. OK, that's what is described below with security_bprm_creds_for_exec(). Each LSM can implement this change with the current patch series, but that should be part of a dedicated patch series per LSM, for those willing to leverage this new feature. > > > > This is different than faccessat(2) which only checks file access > > rights, but not the full context e.g. mount point's noexec, stack limit, > > and all potential LSM extra checks (e.g. argv, envp, credentials). > > Since the use of AT_CHECK follows the exact kernel semantic as for a > > real execution, user space gets the same error codes. > > > > With the information that a script interpreter is about to interpret a > > script, an LSM security policy can adjust caller's access rights or log > > execution request as for native script execution (e.g. role transition). > > This is possible thanks to the call to security_bprm_creds_for_exec(). > > > > Because LSMs may only change bprm's credentials, use of AT_CHECK with > > current kernel code should not be a security issue (e.g. unexpected role > > transition). LSMs willing to update the caller's credential could now > > do so when bprm->is_check is set. Of course, such policy change should > > be in line with the new user space code. > > > > Because AT_CHECK is dedicated to user space interpreters, it doesn't > > make sense for the kernel to parse the checked files, look for > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC > > if the format is unknown. Because of that, security_bprm_check() is > > never called when AT_CHECK is used. > > > > It should be noted that script interpreters cannot directly use > > execveat(2) (without this new AT_CHECK flag) because this could lead to > > unexpected behaviors e.g., `python script.sh` could lead to Bash being > > executed to interpret the script. Unlike the kernel, script > > interpreters may just interpret the shebang as a simple comment, which > > should not change for backward compatibility reasons. > > > > Because scripts or libraries files might not currently have the > > executable permission set, or because we might want specific users to be > > allowed to run arbitrary scripts, the following patch provides a dynamic > > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and > > SECBIT_SHOULD_EXEC_RESTRICT securebits. > > Can you explain what those bits do? And why they're useful? I didn't want to duplicate the comments above their definition explaining their usage. Please let me know if it's not enough. > > > > > This is a redesign of the CLIP OS 4's O_MAYEXEC: > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch > > This patch has been used for more than a decade with customized script > > interpreters. Some examples can be found here: > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC > > This one at least returns an fd, so it looks less likely to get > misused in a way that adds a TOCTOU race. We can use both an FD or a path name with execveat(2). See discussion with Kees and comment from Linus.
On Sat, Jul 06, 2024 at 05:32:12PM +0200, Florian Weimer wrote: > * Mickaël Salaün: > > > On Fri, Jul 05, 2024 at 08:03:14PM +0200, Florian Weimer wrote: > >> * Mickaël Salaün: > >> > >> > Add a new AT_CHECK flag to execveat(2) to check if a file would be > >> > allowed for execution. The main use case is for script interpreters and > >> > dynamic linkers to check execution permission according to the kernel's > >> > security policy. Another use case is to add context to access logs e.g., > >> > which script (instead of interpreter) accessed a file. As any > >> > executable code, scripts could also use this check [1]. > >> > >> Some distributions no longer set executable bits on most shared objects, > >> which I assume would interfere with AT_CHECK probing for shared objects. > > > > A file without the execute permission is not considered as executable by > > the kernel. The AT_CHECK flag doesn't change this semantic. Please > > note that this is just a check, not a restriction. See the next patch > > for the optional policy enforcement. > > > > Anyway, we need to define the policy, and for Linux this is done with > > the file permission bits. So for systems willing to have a consistent > > execution policy, we need to rely on the same bits. > > Yes, that makes complete sense. I just wanted to point out the odd > interaction with the old binutils bug and the (sadly still current) > kernel bug. > > >> Removing the executable bit is attractive because of a combination of > >> two bugs: a binutils wart which until recently always set the entry > >> point address in the ELF header to zero, and the kernel not checking for > >> a zero entry point (maybe in combination with an absent program > >> interpreter) and failing the execve with ELIBEXEC, instead of doing the > >> execve and then faulting at virtual address zero. Removing the > >> executable bit is currently the only way to avoid these confusing > >> crashes, so I understand the temptation. > > > > Interesting. Can you please point to the bug report and the fix? I > > don't see any ELIBEXEC in the kernel. > > The kernel hasn't been fixed yet. I do think this should be fixed, so > that distributions can bring back the executable bit. Can you please point to the mailing list discussion or the bug report?
Hi On Fri, Jul 5, 2024 at 11:03 AM Florian Weimer <fweimer@redhat.com> wrote: > > * Mickaël Salaün: > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > allowed for execution. The main use case is for script interpreters and > > dynamic linkers to check execution permission according to the kernel's > > security policy. Another use case is to add context to access logs e.g., > > which script (instead of interpreter) accessed a file. As any > > executable code, scripts could also use this check [1]. > > Some distributions no longer set executable bits on most shared objects, > which I assume would interfere with AT_CHECK probing for shared objects. > Removing the executable bit is attractive because of a combination of > two bugs: a binutils wart which until recently always set the entry > point address in the ELF header to zero, and the kernel not checking for > a zero entry point (maybe in combination with an absent program > interpreter) and failing the execve with ELIBEXEC, instead of doing the > execve and then faulting at virtual address zero. Removing the > executable bit is currently the only way to avoid these confusing > crashes, so I understand the temptation. > Will dynamic linkers use the execveat(AT_CHECK) to check shared libraries too ? or just the main executable itself. Thanks. -Jeff > Thanks, > Florian >
* Jeff Xu: > Will dynamic linkers use the execveat(AT_CHECK) to check shared > libraries too ? or just the main executable itself. I expect that dynamic linkers will have to do this for everything they map. Usually, that does not include the maim program, but this can happen with explicit loader invocations (“ld.so /bin/true”). Thanks, Florian
On Mon, Jul 8, 2024 at 9:26 AM Florian Weimer <fweimer@redhat.com> wrote: > > * Jeff Xu: > > > Will dynamic linkers use the execveat(AT_CHECK) to check shared > > libraries too ? or just the main executable itself. > > I expect that dynamic linkers will have to do this for everything they > map. Then all the objects (.so, .sh, etc.) will go through the check from execveat's main to security_bprm_creds_for_exec(), some of them might be specific for the main executable ? e.g. ChromeOS uses security_bprm_creds_for_exec to block executable memfd [1], applying this means automatically extending the block to the .so object. I'm not sure if other LSMs need to be updated ? e.g. will SELINUX check for .so with its process transaction policy ? [1] https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/3834992 -Jeff > Usually, that does not include the maim program, but this can > happen with explicit loader invocations (“ld.so /bin/true”). > > Thanks, > Florian >
On Mon, Jul 08, 2024 at 09:40:45AM -0700, Jeff Xu wrote: > On Mon, Jul 8, 2024 at 9:26 AM Florian Weimer <fweimer@redhat.com> wrote: > > > > * Jeff Xu: > > > > > Will dynamic linkers use the execveat(AT_CHECK) to check shared > > > libraries too ? or just the main executable itself. > > > > I expect that dynamic linkers will have to do this for everything they > > map. Correct, that would enable to safely handle LD_PRELOAD for instance. > Then all the objects (.so, .sh, etc.) will go through the check from > execveat's main to security_bprm_creds_for_exec(), some of them might > be specific for the main executable ? > e.g. ChromeOS uses security_bprm_creds_for_exec to block executable > memfd [1], applying this means automatically extending the block to > the .so object. That's a good example of how this AT_CHECK check makes sense. Landlock will probably get a similar (optional) restriction too: https://github.com/landlock-lsm/linux/issues/37 > > I'm not sure if other LSMs need to be updated ? e.g. will SELINUX > check for .so with its process transaction policy ? LSM should not need to be updated with this patch series. However, systems/components/containers enabling this new check should make sure it works with their current policy. > > [1] https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/3834992 > > -Jeff > > > > Usually, that does not include the maim program, but this can > > happen with explicit loader invocations (“ld.so /bin/true”). > > > > Thanks, > > Florian > >
* Jeff Xu: > On Mon, Jul 8, 2024 at 9:26 AM Florian Weimer <fweimer@redhat.com> wrote: >> >> * Jeff Xu: >> >> > Will dynamic linkers use the execveat(AT_CHECK) to check shared >> > libraries too ? or just the main executable itself. >> >> I expect that dynamic linkers will have to do this for everything they >> map. > Then all the objects (.so, .sh, etc.) will go through the check from > execveat's main to security_bprm_creds_for_exec(), some of them might > be specific for the main executable ? If we want to avoid that, we could have an agreed-upon error code which the LSM can signal that it'll never fail AT_CHECK checks, so we only have to perform the extra system call once. Thanks, Florian
On Mon, Jul 8, 2024 at 10:33 AM Florian Weimer <fweimer@redhat.com> wrote: > > * Jeff Xu: > > > On Mon, Jul 8, 2024 at 9:26 AM Florian Weimer <fweimer@redhat.com> wrote: > >> > >> * Jeff Xu: > >> > >> > Will dynamic linkers use the execveat(AT_CHECK) to check shared > >> > libraries too ? or just the main executable itself. > >> > >> I expect that dynamic linkers will have to do this for everything they > >> map. > > Then all the objects (.so, .sh, etc.) will go through the check from > > execveat's main to security_bprm_creds_for_exec(), some of them might > > be specific for the main executable ? > > If we want to avoid that, we could have an agreed-upon error code which > the LSM can signal that it'll never fail AT_CHECK checks, so we only > have to perform the extra system call once. > Right, something like that. I would prefer not having AT_CHECK specific code in LSM code as an initial goal, if that works, great. -Jeff > Thanks, > Florian >
On Fri, Jul 05, 2024 at 07:53:10PM +0200, Mickaël Salaün wrote: > On Thu, Jul 04, 2024 at 05:04:03PM -0700, Kees Cook wrote: > > On Thu, Jul 04, 2024 at 09:01:33PM +0200, Mickaël Salaün wrote: > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > > allowed for execution. The main use case is for script interpreters and > > > dynamic linkers to check execution permission according to the kernel's > > > security policy. Another use case is to add context to access logs e.g., > > > which script (instead of interpreter) accessed a file. As any > > > executable code, scripts could also use this check [1]. > > > > > > This is different than faccessat(2) which only checks file access > > > rights, but not the full context e.g. mount point's noexec, stack limit, > > > and all potential LSM extra checks (e.g. argv, envp, credentials). > > > Since the use of AT_CHECK follows the exact kernel semantic as for a > > > real execution, user space gets the same error codes. > > > > Nice! I much prefer this method of going through the exec machinery so > > we always have a single code path for these kinds of checks. > > > > > Because AT_CHECK is dedicated to user space interpreters, it doesn't > > > make sense for the kernel to parse the checked files, look for > > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC > > > if the format is unknown. Because of that, security_bprm_check() is > > > never called when AT_CHECK is used. > > > > I'd like some additional comments in the code that reminds us that > > access control checks have finished past a certain point. > > Where in the code? Just before the bprm->is_check assignment? Yeah, that's what I was thinking.
On Mon, Jul 08, 2024 at 10:52:36AM -0700, Jeff Xu wrote: > On Mon, Jul 8, 2024 at 10:33 AM Florian Weimer <fweimer@redhat.com> wrote: > > > > * Jeff Xu: > > > > > On Mon, Jul 8, 2024 at 9:26 AM Florian Weimer <fweimer@redhat.com> wrote: > > >> > > >> * Jeff Xu: > > >> > > >> > Will dynamic linkers use the execveat(AT_CHECK) to check shared > > >> > libraries too ? or just the main executable itself. > > >> > > >> I expect that dynamic linkers will have to do this for everything they > > >> map. > > > Then all the objects (.so, .sh, etc.) will go through the check from > > > execveat's main to security_bprm_creds_for_exec(), some of them might > > > be specific for the main executable ? Yes, we should check every executable code (including seccomp filters) to get a consistent policy. What do you mean by "specific for the main executable"? > > > > If we want to avoid that, we could have an agreed-upon error code which > > the LSM can signal that it'll never fail AT_CHECK checks, so we only > > have to perform the extra system call once. I'm not sure to follow. Either we check executable code or we don't, but it doesn't make sense to only check some parts (except for migration of user space code in a system, which is one purpose of the securebits added with the next patch). The idea with AT_CHECK is to unconditionnaly check executable right the same way it is checked when a file is executed. User space can decide to check that or not according to its policy (i.e. securebits). > > > Right, something like that. > I would prefer not having AT_CHECK specific code in LSM code as an > initial goal, if that works, great. LSMs should not need to change anything, but they are free to implement new access right according to AT_CHECK. > > -Jeff > > > Thanks, > > Florian > >
* Mickaël Salaün: >> > If we want to avoid that, we could have an agreed-upon error code which >> > the LSM can signal that it'll never fail AT_CHECK checks, so we only >> > have to perform the extra system call once. > > I'm not sure to follow. Either we check executable code or we don't, > but it doesn't make sense to only check some parts (except for migration > of user space code in a system, which is one purpose of the securebits > added with the next patch). > > The idea with AT_CHECK is to unconditionnaly check executable right the > same way it is checked when a file is executed. User space can decide > to check that or not according to its policy (i.e. securebits). I meant it purely as a performance optimization, to skip future system calls if we know they won't provide any useful information for this process. In the grand scheme of things, the extra system call probably does not matter because we already have to do costly things like mmap. Thanks, Florian
On Tue, Jul 9, 2024 at 2:18 AM Mickaël Salaün <mic@digikod.net> wrote: > > On Mon, Jul 08, 2024 at 10:52:36AM -0700, Jeff Xu wrote: > > On Mon, Jul 8, 2024 at 10:33 AM Florian Weimer <fweimer@redhat.com> wrote: > > > > > > * Jeff Xu: > > > > > > > On Mon, Jul 8, 2024 at 9:26 AM Florian Weimer <fweimer@redhat.com> wrote: > > > >> > > > >> * Jeff Xu: > > > >> > > > >> > Will dynamic linkers use the execveat(AT_CHECK) to check shared > > > >> > libraries too ? or just the main executable itself. > > > >> > > > >> I expect that dynamic linkers will have to do this for everything they > > > >> map. > > > > Then all the objects (.so, .sh, etc.) will go through the check from > > > > execveat's main to security_bprm_creds_for_exec(), some of them might > > > > be specific for the main executable ? > > Yes, we should check every executable code (including seccomp filters) > to get a consistent policy. > > What do you mean by "specific for the main executable"? > I meant: The check is for the exe itself, not .so, etc. For example: /usr/bin/touch is checked. not the shared objects: ldd /usr/bin/touch linux-vdso.so.1 (0x00007ffdc988f000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f59b6757000) /lib64/ld-linux-x86-64.so.2 (0x00007f59b6986000) Basically, I asked if the check can be extended to shared-objects, seccomp filters, etc, without modifying existing LSMs. you pointed out "LSM should not need to be updated with this patch series.", which already answered my question. Thanks. -Jeff -Jeff
On Tue, Jul 09, 2024 at 11:57:27AM -0700, Jeff Xu wrote: > On Tue, Jul 9, 2024 at 2:18 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > On Mon, Jul 08, 2024 at 10:52:36AM -0700, Jeff Xu wrote: > > > On Mon, Jul 8, 2024 at 10:33 AM Florian Weimer <fweimer@redhat.com> wrote: > > > > > > > > * Jeff Xu: > > > > > > > > > On Mon, Jul 8, 2024 at 9:26 AM Florian Weimer <fweimer@redhat.com> wrote: > > > > >> > > > > >> * Jeff Xu: > > > > >> > > > > >> > Will dynamic linkers use the execveat(AT_CHECK) to check shared > > > > >> > libraries too ? or just the main executable itself. > > > > >> > > > > >> I expect that dynamic linkers will have to do this for everything they > > > > >> map. > > > > > Then all the objects (.so, .sh, etc.) will go through the check from > > > > > execveat's main to security_bprm_creds_for_exec(), some of them might > > > > > be specific for the main executable ? > > > > Yes, we should check every executable code (including seccomp filters) > > to get a consistent policy. > > > > What do you mean by "specific for the main executable"? > > > I meant: > > The check is for the exe itself, not .so, etc. > > For example: /usr/bin/touch is checked. > not the shared objects: > ldd /usr/bin/touch > linux-vdso.so.1 (0x00007ffdc988f000) > libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f59b6757000) > /lib64/ld-linux-x86-64.so.2 (0x00007f59b6986000) ld.so should be patched to check shared-objects. > > Basically, I asked if the check can be extended to shared-objects, > seccomp filters, etc, without modifying existing LSMs. Yes, the check should be used against any piece of code such as shared-objects, seccomp filters... > you pointed out "LSM should not need to be updated with this patch > series.", which already answered my question. > > Thanks. > -Jeff > > -Jeff
On Tue, Jul 09, 2024 at 12:05:50PM +0200, Florian Weimer wrote: > * Mickaël Salaün: > > >> > If we want to avoid that, we could have an agreed-upon error code which > >> > the LSM can signal that it'll never fail AT_CHECK checks, so we only > >> > have to perform the extra system call once. > > > > I'm not sure to follow. Either we check executable code or we don't, > > but it doesn't make sense to only check some parts (except for migration > > of user space code in a system, which is one purpose of the securebits > > added with the next patch). > > > > The idea with AT_CHECK is to unconditionnaly check executable right the > > same way it is checked when a file is executed. User space can decide > > to check that or not according to its policy (i.e. securebits). > > I meant it purely as a performance optimization, to skip future system > calls if we know they won't provide any useful information for this > process. In the grand scheme of things, the extra system call probably > does not matter because we already have to do costly things like mmap. Indeed, the performance impact of execveat+AT_CHECK should be negligible compared to everything else needed to interpret a script or spawn a process. Moreover, these checks should only be performed when SECBIT_SHOULD_EXEC_CHECK is set for the caller.
On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > allowed for execution. The main use case is for script interpreters and > dynamic linkers to check execution permission according to the kernel's > security policy. Another use case is to add context to access logs e.g., > which script (instead of interpreter) accessed a file. As any > executable code, scripts could also use this check [1]. > > This is different than faccessat(2) which only checks file access > rights, but not the full context e.g. mount point's noexec, stack limit, > and all potential LSM extra checks (e.g. argv, envp, credentials). > Since the use of AT_CHECK follows the exact kernel semantic as for a > real execution, user space gets the same error codes. > So we concluded that execveat(AT_CHECK) will be used to check the exec, shared object, script and config file (such as seccomp config), I'm still thinking execveat(AT_CHECK) vs faccessat(AT_CHECK) in different use cases: execveat clearly has less code change, but that also means: we can't add logic specific to exec (i.e. logic that can't be applied to config) for this part (from do_execveat_common to security_bprm_creds_for_exec) in future. This would require some agreement/sign-off, I'm not sure from whom. -------------------------- now looked at user cases (focus on elf for now) 1> ld.so /tmp/a.out, /tmp/a.out is on non-exec mount dynamic linker will first call execveat(fd, AT_CHECK) then execveat(fd) 2> execve(/usr/bin/some.out) and some.out has dependency on /tmp/a.so /usr/bin/some.out will pass AT_CHECK 3> execve(usr/bin/some.out) and some.out uses custom /tmp/ld.so /usr/bin/some.out will pass AT_CHECK, however, it uses a custom /tmp/ld.so (I assume this is possible for elf header will set the path for ld.so because kernel has no knowledge of that, and binfmt_elf.c allocate memory for ld.so during execveat call) 4> dlopen(/tmp/a.so) I assume dynamic linker will call execveat(AT_CHECK), before map a.so into memory. For case 1> Alternative solution: Because AT_CHECK is always called, I think we can avoid the first AT_CHECK call, and check during execveat(fd), this means the kernel will enforce SECBIT_EXEC_RESTRICT_FILE = 1, the benefit is that there is no TOCTOU and save one round trip of syscall for a succesful execveat() case. For case 2> dynamic linker will call execve(AT_CHECK), then mmap(fd) into memory. However, the process can all open then mmap() directly, it seems minimal effort for an attacker to walk around such a defence from dynamic linker. Alternative solution: dynamic linker call AT_CHECK for each .so, kernel will save the state (associated with fd) kernel will check fd state at the time of mmap(fd, executable memory) and enforce SECBIT_EXEC_RESTRICT_FILE = 1 Alternative solution 2: a new syscall to load the .so and enforce the AT_CHECK in kernel This also means, for the solution to be complete, we might want to block creation of executable anonymous memory (e.g. by seccomp, ), unless the user space can harden the creation of executable anonymous memory in some way. For case 3> I think binfmt_elf.c in the kernel needs to check the ld.so to make sure it passes AT_CHECK, before loading it into memory. For case 4> same as case 2. Consider those cases: I think: a> relying purely on userspace for enforcement does't seem to be effective, e.g. it is trivial to call open(), then mmap() it into executable memory. b> if both user space and kernel need to call AT_CHECK, the faccessat seems to be a better place for AT_CHECK, e.g. kernel can call do_faccessat(AT_CHECK) and userspace can call faccessat(). This will avoid complicating the execveat() code path. What do you think ? Thanks -Jeff > With the information that a script interpreter is about to interpret a > script, an LSM security policy can adjust caller's access rights or log > execution request as for native script execution (e.g. role transition). > This is possible thanks to the call to security_bprm_creds_for_exec(). > > Because LSMs may only change bprm's credentials, use of AT_CHECK with > current kernel code should not be a security issue (e.g. unexpected role > transition). LSMs willing to update the caller's credential could now > do so when bprm->is_check is set. Of course, such policy change should > be in line with the new user space code. > > Because AT_CHECK is dedicated to user space interpreters, it doesn't > make sense for the kernel to parse the checked files, look for > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC > if the format is unknown. Because of that, security_bprm_check() is > never called when AT_CHECK is used. > > It should be noted that script interpreters cannot directly use > execveat(2) (without this new AT_CHECK flag) because this could lead to > unexpected behaviors e.g., `python script.sh` could lead to Bash being > executed to interpret the script. Unlike the kernel, script > interpreters may just interpret the shebang as a simple comment, which > should not change for backward compatibility reasons. > > Because scripts or libraries files might not currently have the > executable permission set, or because we might want specific users to be > allowed to run arbitrary scripts, the following patch provides a dynamic > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and > SECBIT_SHOULD_EXEC_RESTRICT securebits. > > This is a redesign of the CLIP OS 4's O_MAYEXEC: > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch > This patch has been used for more than a decade with customized script > interpreters. Some examples can be found here: > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC > > Cc: Al Viro <viro@zeniv.linux.org.uk> > Cc: Christian Brauner <brauner@kernel.org> > Cc: Kees Cook <keescook@chromium.org> > Cc: Paul Moore <paul@paul-moore.com> > Link: https://docs.python.org/3/library/io.html#io.open_code [1] > Signed-off-by: Mickaël Salaün <mic@digikod.net> > Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net > --- > > New design since v18: > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net > --- > fs/exec.c | 5 +++-- > include/linux/binfmts.h | 7 ++++++- > include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++ > kernel/audit.h | 1 + > kernel/auditsc.c | 1 + > 5 files changed, 41 insertions(+), 3 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 40073142288f..ea2a1867afdc 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags) > .lookup_flags = LOOKUP_FOLLOW, > }; > > - if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0) > + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0) > return ERR_PTR(-EINVAL); > if (flags & AT_SYMLINK_NOFOLLOW) > open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW; > @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl > bprm->filename = bprm->fdpath; > } > bprm->interp = bprm->filename; > + bprm->is_check = !!(flags & AT_CHECK); > > retval = bprm_mm_init(bprm); > if (!retval) > @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm) > > /* Set the unchanging part of bprm->cred */ > retval = security_bprm_creds_for_exec(bprm); > - if (retval) > + if (retval || bprm->is_check) > goto out; > > retval = exec_binprm(bprm); > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > index 70f97f685bff..8ff9c9e33aed 100644 > --- a/include/linux/binfmts.h > +++ b/include/linux/binfmts.h > @@ -42,7 +42,12 @@ struct linux_binprm { > * Set when errors can no longer be returned to the > * original userspace. > */ > - point_of_no_return:1; > + point_of_no_return:1, > + /* > + * Set by user space to check executability according to the > + * caller's environment. > + */ > + is_check:1; > struct file *executable; /* Executable to pass to the interpreter */ > struct file *interpreter; > struct file *file; > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h > index c0bcc185fa48..bcd05c59b7df 100644 > --- a/include/uapi/linux/fcntl.h > +++ b/include/uapi/linux/fcntl.h > @@ -118,6 +118,36 @@ > #define AT_HANDLE_FID AT_REMOVEDIR /* file handle is needed to > compare object identity and may not > be usable to open_by_handle_at(2) */ > + > +/* > + * AT_CHECK only performs a check on a regular file and returns 0 if execution > + * of this file would be allowed, ignoring the file format and then the related > + * interpreter dependencies (e.g. ELF libraries, script's shebang). AT_CHECK > + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling > + * thread. See securebits.h documentation. > + * > + * Programs should use this check to apply kernel-level checks against files > + * that are not directly executed by the kernel but directly passed to a user > + * space interpreter instead. All files that contain executable code, from the > + * point of view of the interpreter, should be checked. The main purpose of > + * this flag is to improve the security and consistency of an execution > + * environment to ensure that direct file execution (e.g. ./script.sh) and > + * indirect file execution (e.g. sh script.sh) lead to the same result. For > + * instance, this can be used to check if a file is trustworthy according to > + * the caller's environment. > + * > + * In a secure environment, libraries and any executable dependencies should > + * also be checked. For instance dynamic linking should make sure that all > + * libraries are allowed for execution to avoid trivial bypass (e.g. using > + * LD_PRELOAD). For such secure execution environment to make sense, only > + * trusted code should be executable, which also requires integrity guarantees. > + * > + * To avoid race conditions leading to time-of-check to time-of-use issues, > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file > + * descriptor instead of a path. > + */ > +#define AT_CHECK 0x10000 > + > #if defined(__KERNEL__) > #define AT_GETATTR_NOSEC 0x80000000 > #endif > diff --git a/kernel/audit.h b/kernel/audit.h > index a60d2840559e..8ebdabd2ab81 100644 > --- a/kernel/audit.h > +++ b/kernel/audit.h > @@ -197,6 +197,7 @@ struct audit_context { > struct open_how openat2; > struct { > int argc; > + bool is_check; > } execve; > struct { > char *name; > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > index 6f0d6fb6523f..b6316e284342 100644 > --- a/kernel/auditsc.c > +++ b/kernel/auditsc.c > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm) > > context->type = AUDIT_EXECVE; > context->execve.argc = bprm->argc; > + context->execve.is_check = bprm->is_check; > } > > > -- > 2.45.2 >
On 17/07/2024 07:33, Jeff Xu wrote: > Consider those cases: I think: > a> relying purely on userspace for enforcement does't seem to be > effective, e.g. it is trivial to call open(), then mmap() it into > executable memory. If there's a way to do this without running executable code that had to pass a previous execveat() check, then yeah, it's not effective (e.g. a Python interpreter that *doesn't* enforce execveat() is a trivial way to do it). Once arbitrary code is running, all bets are off. So long as all arbitrary code is being checked itself, it's allowed to do things that would bypass later checks (and it's up to whoever audited it in the first place to prevent this by not giving it the special mark that allows it to pass the check). > b> if both user space and kernel need to call AT_CHECK, the faccessat > seems to be a better place for AT_CHECK, e.g. kernel can call > do_faccessat(AT_CHECK) and userspace can call faccessat(). This will > avoid complicating the execveat() code path. > > What do you think ? > > Thanks > -Jeff
On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote: > On 17/07/2024 07:33, Jeff Xu wrote: > > Consider those cases: I think: > > a> relying purely on userspace for enforcement does't seem to be > > effective, e.g. it is trivial to call open(), then mmap() it into > > executable memory. > > If there's a way to do this without running executable code that had to pass > a previous execveat() check, then yeah, it's not effective (e.g. a Python > interpreter that *doesn't* enforce execveat() is a trivial way to do it). > > Once arbitrary code is running, all bets are off. So long as all arbitrary > code is being checked itself, it's allowed to do things that would bypass > later checks (and it's up to whoever audited it in the first place to > prevent this by not giving it the special mark that allows it to pass the > check). Exactly. As explained in the patches, one crucial prerequisite is that the executable code is trusted, and the system must provide integrity guarantees. We cannot do anything without that. This patches series is a building block to fix a blind spot on Linux systems to be able to fully control executability.
On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote: > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > allowed for execution. The main use case is for script interpreters and > > dynamic linkers to check execution permission according to the kernel's > > security policy. Another use case is to add context to access logs e.g., > > which script (instead of interpreter) accessed a file. As any > > executable code, scripts could also use this check [1]. > > > > This is different than faccessat(2) which only checks file access > > rights, but not the full context e.g. mount point's noexec, stack limit, > > and all potential LSM extra checks (e.g. argv, envp, credentials). > > Since the use of AT_CHECK follows the exact kernel semantic as for a > > real execution, user space gets the same error codes. > > > So we concluded that execveat(AT_CHECK) will be used to check the > exec, shared object, script and config file (such as seccomp config), "config file" that contains executable code. > I'm still thinking execveat(AT_CHECK) vs faccessat(AT_CHECK) in > different use cases: > > execveat clearly has less code change, but that also means: we can't > add logic specific to exec (i.e. logic that can't be applied to > config) for this part (from do_execveat_common to > security_bprm_creds_for_exec) in future. This would require some > agreement/sign-off, I'm not sure from whom. I'm not sure to follow. We could still add new flags, but for now I don't see use cases. This patch series is not meant to handle all possible "trust checks", only executable code, which makes sense for the kernel. If we want other checks, we'll need to clearly define their semantic and align with the kernel. faccessat2(2) might be used to check other file properties, but the executable property is not only defined by the file attributes. > > -------------------------- > now looked at user cases (focus on elf for now) > > 1> ld.so /tmp/a.out, /tmp/a.out is on non-exec mount > dynamic linker will first call execveat(fd, AT_CHECK) then execveat(fd) > > 2> execve(/usr/bin/some.out) and some.out has dependency on /tmp/a.so > /usr/bin/some.out will pass AT_CHECK > > 3> execve(usr/bin/some.out) and some.out uses custom /tmp/ld.so > /usr/bin/some.out will pass AT_CHECK, however, it uses a custom > /tmp/ld.so (I assume this is possible for elf header will set the > path for ld.so because kernel has no knowledge of that, and > binfmt_elf.c allocate memory for ld.so during execveat call) > > 4> dlopen(/tmp/a.so) > I assume dynamic linker will call execveat(AT_CHECK), before map a.so > into memory. > > For case 1> > Alternative solution: Because AT_CHECK is always called, I think we > can avoid the first AT_CHECK call, and check during execveat(fd), There is no need to use AT_CHECK if we're going to call execveat(2) on the same file descriptor. By design, AT_CHECK is implicit for any execve(2). > this means the kernel will enforce SECBIT_EXEC_RESTRICT_FILE = 1, the > benefit is that there is no TOCTOU and save one round trip of syscall > for a succesful execveat() case. As long as user space uses the same file descriptor, there is no TOCTOU. SECBIT_EXEC_RESTRICT_FILE only makes sense for user space: it defines the user space security policy. The kernel already enforces the same security policy for any execve(2), whatever are the calling process's securebits. > > For case 2> > dynamic linker will call execve(AT_CHECK), then mmap(fd) into memory. > However, the process can all open then mmap() directly, it seems > minimal effort for an attacker to walk around such a defence from > dynamic linker. Which process? What do you mean by "can all open then mmap() directly"? In this context the dynamic linker (like its parent processes) is trusted (guaranteed by the system). For case 2, the dynamic linker must check with AT_CHECK all files that will be mapped, which include /usr/bin/some.out and /tmp/a.so > > Alternative solution: > dynamic linker call AT_CHECK for each .so, kernel will save the state > (associated with fd) > kernel will check fd state at the time of mmap(fd, executable memory) > and enforce SECBIT_EXEC_RESTRICT_FILE = 1 The idea with AT_CHECK is that there is no kernel side effect, no extra kernel state, and the semantic is the same as with execve(2). This also enables us to check file's executable permission and ignore it, which is useful in a "permissive mode" when preparing for a migration without breaking a system, or to do extra integrity checks. BTW, this use case would also be more complex with a new openat2(2) flag like the original O_MAYEXEC. > > Alternative solution 2: > a new syscall to load the .so and enforce the AT_CHECK in kernel A new syscall would be overkill for this feature. Please see Linus's comment. > > This also means, for the solution to be complete, we might want to > block creation of executable anonymous memory (e.g. by seccomp, ), How seccomp could create anonymous memory in user space? seccomp filters should be treated (and checked with AT_CHECK) as executable code anyway. > unless the user space can harden the creation of executable anonymous > memory in some way. User space is already in charge of mmapping its own memory. I don't see what is missing. > > For case 3> > I think binfmt_elf.c in the kernel needs to check the ld.so to make > sure it passes AT_CHECK, before loading it into memory. All ELF dependencies are opened and checked with open_exec(), which perform the main executability checks (with the __FMODE_EXEC flag). Did I miss something? However, we must be careful with programs using the (deprecated) uselib(2). They should also check with AT_CHECK because this syscall opens the shared library without __FMODE_EXEC (similar to a simple file open). See https://lore.kernel.org/all/CAHk-=wiUwRG7LuR=z5sbkFVGQh+7qVB6_1NM0Ny9SVNL1Un4Sw@mail.gmail.com/ > > For case 4> > same as case 2. > > Consider those cases: I think: > a> relying purely on userspace for enforcement does't seem to be > effective, e.g. it is trivial to call open(), then mmap() it into > executable memory. As Steve explained (and is also explained in the patches), it is trivial if the attacker can already execute its own code, which is too late to enforce any script execution control. > b> if both user space and kernel need to call AT_CHECK, the faccessat > seems to be a better place for AT_CHECK, e.g. kernel can call > do_faccessat(AT_CHECK) and userspace can call faccessat(). This will > avoid complicating the execveat() code path. A previous version of this patches series already patched faccessat(2), but this is not the right place. faccessat2(2) is dedicated to check file permissions, not executability (e.g. with mount's noexec). > > What do you think ? I think there are some misunderstandings. Please let me know if it's clearer now. > > Thanks > -Jeff > > > With the information that a script interpreter is about to interpret a > > script, an LSM security policy can adjust caller's access rights or log > > execution request as for native script execution (e.g. role transition). > > This is possible thanks to the call to security_bprm_creds_for_exec(). > > > > Because LSMs may only change bprm's credentials, use of AT_CHECK with > > current kernel code should not be a security issue (e.g. unexpected role > > transition). LSMs willing to update the caller's credential could now > > do so when bprm->is_check is set. Of course, such policy change should > > be in line with the new user space code. > > > > Because AT_CHECK is dedicated to user space interpreters, it doesn't > > make sense for the kernel to parse the checked files, look for > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC > > if the format is unknown. Because of that, security_bprm_check() is > > never called when AT_CHECK is used. > > > > It should be noted that script interpreters cannot directly use > > execveat(2) (without this new AT_CHECK flag) because this could lead to > > unexpected behaviors e.g., `python script.sh` could lead to Bash being > > executed to interpret the script. Unlike the kernel, script > > interpreters may just interpret the shebang as a simple comment, which > > should not change for backward compatibility reasons. > > > > Because scripts or libraries files might not currently have the > > executable permission set, or because we might want specific users to be > > allowed to run arbitrary scripts, the following patch provides a dynamic > > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and > > SECBIT_SHOULD_EXEC_RESTRICT securebits. > > > > This is a redesign of the CLIP OS 4's O_MAYEXEC: > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch > > This patch has been used for more than a decade with customized script > > interpreters. Some examples can be found here: > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC > > > > Cc: Al Viro <viro@zeniv.linux.org.uk> > > Cc: Christian Brauner <brauner@kernel.org> > > Cc: Kees Cook <keescook@chromium.org> > > Cc: Paul Moore <paul@paul-moore.com> > > Link: https://docs.python.org/3/library/io.html#io.open_code [1] > > Signed-off-by: Mickaël Salaün <mic@digikod.net> > > Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net > > --- > > > > New design since v18: > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net > > --- > > fs/exec.c | 5 +++-- > > include/linux/binfmts.h | 7 ++++++- > > include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++ > > kernel/audit.h | 1 + > > kernel/auditsc.c | 1 + > > 5 files changed, 41 insertions(+), 3 deletions(-) > > > > diff --git a/fs/exec.c b/fs/exec.c > > index 40073142288f..ea2a1867afdc 100644 > > --- a/fs/exec.c > > +++ b/fs/exec.c > > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags) > > .lookup_flags = LOOKUP_FOLLOW, > > }; > > > > - if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0) > > + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0) > > return ERR_PTR(-EINVAL); > > if (flags & AT_SYMLINK_NOFOLLOW) > > open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW; > > @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl > > bprm->filename = bprm->fdpath; > > } > > bprm->interp = bprm->filename; > > + bprm->is_check = !!(flags & AT_CHECK); > > > > retval = bprm_mm_init(bprm); > > if (!retval) > > @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm) > > > > /* Set the unchanging part of bprm->cred */ > > retval = security_bprm_creds_for_exec(bprm); > > - if (retval) > > + if (retval || bprm->is_check) > > goto out; > > > > retval = exec_binprm(bprm); > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > > index 70f97f685bff..8ff9c9e33aed 100644 > > --- a/include/linux/binfmts.h > > +++ b/include/linux/binfmts.h > > @@ -42,7 +42,12 @@ struct linux_binprm { > > * Set when errors can no longer be returned to the > > * original userspace. > > */ > > - point_of_no_return:1; > > + point_of_no_return:1, > > + /* > > + * Set by user space to check executability according to the > > + * caller's environment. > > + */ > > + is_check:1; > > struct file *executable; /* Executable to pass to the interpreter */ > > struct file *interpreter; > > struct file *file; > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h > > index c0bcc185fa48..bcd05c59b7df 100644 > > --- a/include/uapi/linux/fcntl.h > > +++ b/include/uapi/linux/fcntl.h > > @@ -118,6 +118,36 @@ > > #define AT_HANDLE_FID AT_REMOVEDIR /* file handle is needed to > > compare object identity and may not > > be usable to open_by_handle_at(2) */ > > + > > +/* > > + * AT_CHECK only performs a check on a regular file and returns 0 if execution > > + * of this file would be allowed, ignoring the file format and then the related > > + * interpreter dependencies (e.g. ELF libraries, script's shebang). AT_CHECK > > + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling > > + * thread. See securebits.h documentation. > > + * > > + * Programs should use this check to apply kernel-level checks against files > > + * that are not directly executed by the kernel but directly passed to a user > > + * space interpreter instead. All files that contain executable code, from the > > + * point of view of the interpreter, should be checked. The main purpose of > > + * this flag is to improve the security and consistency of an execution > > + * environment to ensure that direct file execution (e.g. ./script.sh) and > > + * indirect file execution (e.g. sh script.sh) lead to the same result. For > > + * instance, this can be used to check if a file is trustworthy according to > > + * the caller's environment. > > + * > > + * In a secure environment, libraries and any executable dependencies should > > + * also be checked. For instance dynamic linking should make sure that all > > + * libraries are allowed for execution to avoid trivial bypass (e.g. using > > + * LD_PRELOAD). For such secure execution environment to make sense, only > > + * trusted code should be executable, which also requires integrity guarantees. > > + * > > + * To avoid race conditions leading to time-of-check to time-of-use issues, > > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file > > + * descriptor instead of a path. > > + */ > > +#define AT_CHECK 0x10000 > > + > > #if defined(__KERNEL__) > > #define AT_GETATTR_NOSEC 0x80000000 > > #endif > > diff --git a/kernel/audit.h b/kernel/audit.h > > index a60d2840559e..8ebdabd2ab81 100644 > > --- a/kernel/audit.h > > +++ b/kernel/audit.h > > @@ -197,6 +197,7 @@ struct audit_context { > > struct open_how openat2; > > struct { > > int argc; > > + bool is_check; > > } execve; > > struct { > > char *name; > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > > index 6f0d6fb6523f..b6316e284342 100644 > > --- a/kernel/auditsc.c > > +++ b/kernel/auditsc.c > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm) > > > > context->type = AUDIT_EXECVE; > > context->execve.argc = bprm->argc; > > + context->execve.is_check = bprm->is_check; > > } > > > > > > -- > > 2.45.2 > > >
> On Jul 17, 2024, at 6:01 PM, Mickaël Salaün <mic@digikod.net> wrote: > > On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote: >>> On 17/07/2024 07:33, Jeff Xu wrote: >>> Consider those cases: I think: >>> a> relying purely on userspace for enforcement does't seem to be >>> effective, e.g. it is trivial to call open(), then mmap() it into >>> executable memory. >> >> If there's a way to do this without running executable code that had to pass >> a previous execveat() check, then yeah, it's not effective (e.g. a Python >> interpreter that *doesn't* enforce execveat() is a trivial way to do it). >> >> Once arbitrary code is running, all bets are off. So long as all arbitrary >> code is being checked itself, it's allowed to do things that would bypass >> later checks (and it's up to whoever audited it in the first place to >> prevent this by not giving it the special mark that allows it to pass the >> check). > > Exactly. As explained in the patches, one crucial prerequisite is that > the executable code is trusted, and the system must provide integrity > guarantees. We cannot do anything without that. This patches series is > a building block to fix a blind spot on Linux systems to be able to > fully control executability. Circling back to my previous comment (did that ever get noticed?), I don’t think this is quite right: https://lore.kernel.org/all/CALCETrWYu=PYJSgyJ-vaa+3BGAry8Jo8xErZLiGR3U5h6+U0tA@mail.gmail.com/ On a basic system configuration, a given path either may or may not be executed. And maybe that path has some integrity check (dm-verity, etc). So the kernel should tell the interpreter/loader whether the target may be executed. All fine. But I think the more complex cases are more interesting, and the “execute a program” process IS NOT BINARY. An attempt to execute can be rejected outright, or it can be allowed *with a change to creds or security context*. It would be entirely reasonable to have a policy that allows execution of non-integrity-checked files but in a very locked down context only. So… shouldn’t a patch series to this effect actually support this?
On Wed, Jul 17, 2024 at 3:00 AM Mickaël Salaün <mic@digikod.net> wrote: > > On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote: > > On 17/07/2024 07:33, Jeff Xu wrote: > > > Consider those cases: I think: > > > a> relying purely on userspace for enforcement does't seem to be > > > effective, e.g. it is trivial to call open(), then mmap() it into > > > executable memory. > > > > If there's a way to do this without running executable code that had to pass > > a previous execveat() check, then yeah, it's not effective (e.g. a Python > > interpreter that *doesn't* enforce execveat() is a trivial way to do it). > > > > Once arbitrary code is running, all bets are off. So long as all arbitrary > > code is being checked itself, it's allowed to do things that would bypass > > later checks (and it's up to whoever audited it in the first place to > > prevent this by not giving it the special mark that allows it to pass the > > check). > We will want to define what is considered as "arbitrary code is running" Using an example of ROP, attackers change the return address in stack, e.g. direct the execution flow to a gauge to call "ld.so /tmp/a.out", do you consider "arbitrary code is running" when stack is overwritten ? or after execve() is called. If it is later, this patch can prevent "ld.so /tmp/a.out". > Exactly. As explained in the patches, one crucial prerequisite is that > the executable code is trusted, and the system must provide integrity > guarantees. We cannot do anything without that. This patches series is > a building block to fix a blind spot on Linux systems to be able to > fully control executability. Even trusted executable can have a bug. I'm thinking in the context of ChromeOS, where all its system services are from trusted partitions, and legit code won't load .so from a non-exec mount. But we want to sandbox those services, so even under some kind of ROP attack, the service still won't be able to load .so from /tmp. Of course, if an attacker can already write arbitrary length of data into the stack, it is probably already a game over.
On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote: > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote: > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > > allowed for execution. The main use case is for script interpreters and > > > dynamic linkers to check execution permission according to the kernel's > > > security policy. Another use case is to add context to access logs e.g., > > > which script (instead of interpreter) accessed a file. As any > > > executable code, scripts could also use this check [1]. > > > > > > This is different than faccessat(2) which only checks file access > > > rights, but not the full context e.g. mount point's noexec, stack limit, > > > and all potential LSM extra checks (e.g. argv, envp, credentials). > > > Since the use of AT_CHECK follows the exact kernel semantic as for a > > > real execution, user space gets the same error codes. > > > > > So we concluded that execveat(AT_CHECK) will be used to check the > > exec, shared object, script and config file (such as seccomp config), > > "config file" that contains executable code. > Is seccomp config considered as "contains executable code", seccomp config is translated into bpf, so maybe yes ? but bpf is running in the kernel. > > I'm still thinking execveat(AT_CHECK) vs faccessat(AT_CHECK) in > > different use cases: > > > > execveat clearly has less code change, but that also means: we can't > > add logic specific to exec (i.e. logic that can't be applied to > > config) for this part (from do_execveat_common to > > security_bprm_creds_for_exec) in future. This would require some > > agreement/sign-off, I'm not sure from whom. > > I'm not sure to follow. We could still add new flags, but for now I > don't see use cases. This patch series is not meant to handle all > possible "trust checks", only executable code, which makes sense for the > kernel. > I guess the "configfile" discussion is where I get confused, at one point, I think this would become a generic "trust checks" api for everything related to "generating executable code", e.g. javascript, java code, and more. We will want to clearly define the scope of execveat(AT_CHECK) > If we want other checks, we'll need to clearly define their semantic and > align with the kernel. faccessat2(2) might be used to check other file > properties, but the executable property is not only defined by the file > attributes. > Agreed. > > > > -------------------------- > > now looked at user cases (focus on elf for now) > > > > 1> ld.so /tmp/a.out, /tmp/a.out is on non-exec mount > > dynamic linker will first call execveat(fd, AT_CHECK) then execveat(fd) > > > > 2> execve(/usr/bin/some.out) and some.out has dependency on /tmp/a.so > > /usr/bin/some.out will pass AT_CHECK > > > > 3> execve(usr/bin/some.out) and some.out uses custom /tmp/ld.so > > /usr/bin/some.out will pass AT_CHECK, however, it uses a custom > > /tmp/ld.so (I assume this is possible for elf header will set the > > path for ld.so because kernel has no knowledge of that, and > > binfmt_elf.c allocate memory for ld.so during execveat call) > > > > 4> dlopen(/tmp/a.so) > > I assume dynamic linker will call execveat(AT_CHECK), before map a.so > > into memory. > > > > For case 1> > > Alternative solution: Because AT_CHECK is always called, I think we > > can avoid the first AT_CHECK call, and check during execveat(fd), > > There is no need to use AT_CHECK if we're going to call execveat(2) on > the same file descriptor. By design, AT_CHECK is implicit for any > execve(2). > Yes. I realized I was wrong to say that ld.so will call execve() for /tmp/a.out, there is no execve() call, otherwise it would have been blocked already today. The ld.so will mmap the /tmp/a.out directly. So case 1 is no different than case 2 and 4. ( the elf objects are mapped to memory by dynamic linker.) I'm not familiar with dynamic linker, Florian is on this thread, and can help to correct me if my guess is wrong. > > this means the kernel will enforce SECBIT_EXEC_RESTRICT_FILE = 1, the > > benefit is that there is no TOCTOU and save one round trip of syscall > > for a succesful execveat() case. > > As long as user space uses the same file descriptor, there is no TOCTOU. > > SECBIT_EXEC_RESTRICT_FILE only makes sense for user space: it defines > the user space security policy. The kernel already enforces the same > security policy for any execve(2), whatever are the calling process's > securebits. > > > > > For case 2> > > dynamic linker will call execve(AT_CHECK), then mmap(fd) into memory. > > However, the process can all open then mmap() directly, it seems > > minimal effort for an attacker to walk around such a defence from > > dynamic linker. > > Which process? What do you mean by "can all open then mmap() directly"? > > In this context the dynamic linker (like its parent processes) is > trusted (guaranteed by the system). > > For case 2, the dynamic linker must check with AT_CHECK all files that > will be mapped, which include /usr/bin/some.out and /tmp/a.so > My point is that the process can work around this by mmap() the file directly. > > > > Alternative solution: > > dynamic linker call AT_CHECK for each .so, kernel will save the state > > (associated with fd) > > kernel will check fd state at the time of mmap(fd, executable memory) > > and enforce SECBIT_EXEC_RESTRICT_FILE = 1 > > The idea with AT_CHECK is that there is no kernel side effect, no extra > kernel state, and the semantic is the same as with execve(2). > > This also enables us to check file's executable permission and ignore > it, which is useful in a "permissive mode" when preparing for a > migration without breaking a system, or to do extra integrity checks. For preparing a migration (detect all violations), this is useful. But as a defense mechanism (SECBIT_EXEC_RESTRICT_FILE = 1) , this seems to be weak, at least for elf loading case. > BTW, this use case would also be more complex with a new openat2(2) flag > like the original O_MAYEXEC. > > > > > Alternative solution 2: > > a new syscall to load the .so and enforce the AT_CHECK in kernel > > A new syscall would be overkill for this feature. Please see Linus's > comment. > maybe, I was thinking on how to prevent "/tmp/a.o" from getting mmap() to executable memory. > > > > This also means, for the solution to be complete, we might want to > > block creation of executable anonymous memory (e.g. by seccomp, ), > > How seccomp could create anonymous memory in user space? > seccomp filters should be treated (and checked with AT_CHECK) as > executable code anyway. > > > unless the user space can harden the creation of executable anonymous > > memory in some way. > > User space is already in charge of mmapping its own memory. I don't see > what is missing. > > > > > For case 3> > > I think binfmt_elf.c in the kernel needs to check the ld.so to make > > sure it passes AT_CHECK, before loading it into memory. > > All ELF dependencies are opened and checked with open_exec(), which > perform the main executability checks (with the __FMODE_EXEC flag). > Did I miss something? > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel. The app can choose its own dynamic linker path during build, (maybe even statically link one ?) This is another reason that relying on a userspace only is not enough. > However, we must be careful with programs using the (deprecated) > uselib(2). They should also check with AT_CHECK because this syscall > opens the shared library without __FMODE_EXEC (similar to a simple file > open). See > https://lore.kernel.org/all/CAHk-=wiUwRG7LuR=z5sbkFVGQh+7qVB6_1NM0Ny9SVNL1Un4Sw@mail.gmail.com/ > > > > > For case 4> > > same as case 2. > > > > Consider those cases: I think: > > a> relying purely on userspace for enforcement does't seem to be > > effective, e.g. it is trivial to call open(), then mmap() it into > > executable memory. > > As Steve explained (and is also explained in the patches), it is trivial > if the attacker can already execute its own code, which is too late to > enforce any script execution control. > > > b> if both user space and kernel need to call AT_CHECK, the faccessat > > seems to be a better place for AT_CHECK, e.g. kernel can call > > do_faccessat(AT_CHECK) and userspace can call faccessat(). This will > > avoid complicating the execveat() code path. > > A previous version of this patches series already patched faccessat(2), > but this is not the right place. faccessat2(2) is dedicated to check > file permissions, not executability (e.g. with mount's noexec). > > > > > What do you think ? > > I think there are some misunderstandings. Please let me know if it's > clearer now. > I'm still not sure about the user case for dynamic linker (elf loading) case. Maybe this patch is more suitable for scripts? A detailed user case will help demonstrate the use case for dynamic linker, e.g. what kind of app will benefit from SECBIT_EXEC_RESTRICT_FILE = 1, what kind of threat model are we dealing with , what kind of attack chain we blocked as a result. > > > > Thanks > > -Jeff > > > > > With the information that a script interpreter is about to interpret a > > > script, an LSM security policy can adjust caller's access rights or log > > > execution request as for native script execution (e.g. role transition). > > > This is possible thanks to the call to security_bprm_creds_for_exec(). > > > > > > Because LSMs may only change bprm's credentials, use of AT_CHECK with > > > current kernel code should not be a security issue (e.g. unexpected role > > > transition). LSMs willing to update the caller's credential could now > > > do so when bprm->is_check is set. Of course, such policy change should > > > be in line with the new user space code. > > > > > > Because AT_CHECK is dedicated to user space interpreters, it doesn't > > > make sense for the kernel to parse the checked files, look for > > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC > > > if the format is unknown. Because of that, security_bprm_check() is > > > never called when AT_CHECK is used. > > > > > > It should be noted that script interpreters cannot directly use > > > execveat(2) (without this new AT_CHECK flag) because this could lead to > > > unexpected behaviors e.g., `python script.sh` could lead to Bash being > > > executed to interpret the script. Unlike the kernel, script > > > interpreters may just interpret the shebang as a simple comment, which > > > should not change for backward compatibility reasons. > > > > > > Because scripts or libraries files might not currently have the > > > executable permission set, or because we might want specific users to be > > > allowed to run arbitrary scripts, the following patch provides a dynamic > > > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and > > > SECBIT_SHOULD_EXEC_RESTRICT securebits. > > > > > > This is a redesign of the CLIP OS 4's O_MAYEXEC: > > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch > > > This patch has been used for more than a decade with customized script > > > interpreters. Some examples can be found here: > > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC > > > > > > Cc: Al Viro <viro@zeniv.linux.org.uk> > > > Cc: Christian Brauner <brauner@kernel.org> > > > Cc: Kees Cook <keescook@chromium.org> > > > Cc: Paul Moore <paul@paul-moore.com> > > > Link: https://docs.python.org/3/library/io.html#io.open_code [1] > > > Signed-off-by: Mickaël Salaün <mic@digikod.net> > > > Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net > > > --- > > > > > > New design since v18: > > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net > > > --- > > > fs/exec.c | 5 +++-- > > > include/linux/binfmts.h | 7 ++++++- > > > include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++ > > > kernel/audit.h | 1 + > > > kernel/auditsc.c | 1 + > > > 5 files changed, 41 insertions(+), 3 deletions(-) > > > > > > diff --git a/fs/exec.c b/fs/exec.c > > > index 40073142288f..ea2a1867afdc 100644 > > > --- a/fs/exec.c > > > +++ b/fs/exec.c > > > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags) > > > .lookup_flags = LOOKUP_FOLLOW, > > > }; > > > > > > - if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0) > > > + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0) > > > return ERR_PTR(-EINVAL); > > > if (flags & AT_SYMLINK_NOFOLLOW) > > > open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW; > > > @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl > > > bprm->filename = bprm->fdpath; > > > } > > > bprm->interp = bprm->filename; > > > + bprm->is_check = !!(flags & AT_CHECK); > > > > > > retval = bprm_mm_init(bprm); > > > if (!retval) > > > @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm) > > > > > > /* Set the unchanging part of bprm->cred */ > > > retval = security_bprm_creds_for_exec(bprm); > > > - if (retval) > > > + if (retval || bprm->is_check) > > > goto out; > > > > > > retval = exec_binprm(bprm); > > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > > > index 70f97f685bff..8ff9c9e33aed 100644 > > > --- a/include/linux/binfmts.h > > > +++ b/include/linux/binfmts.h > > > @@ -42,7 +42,12 @@ struct linux_binprm { > > > * Set when errors can no longer be returned to the > > > * original userspace. > > > */ > > > - point_of_no_return:1; > > > + point_of_no_return:1, > > > + /* > > > + * Set by user space to check executability according to the > > > + * caller's environment. > > > + */ > > > + is_check:1; > > > struct file *executable; /* Executable to pass to the interpreter */ > > > struct file *interpreter; > > > struct file *file; > > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h > > > index c0bcc185fa48..bcd05c59b7df 100644 > > > --- a/include/uapi/linux/fcntl.h > > > +++ b/include/uapi/linux/fcntl.h > > > @@ -118,6 +118,36 @@ > > > #define AT_HANDLE_FID AT_REMOVEDIR /* file handle is needed to > > > compare object identity and may not > > > be usable to open_by_handle_at(2) */ > > > + > > > +/* > > > + * AT_CHECK only performs a check on a regular file and returns 0 if execution > > > + * of this file would be allowed, ignoring the file format and then the related > > > + * interpreter dependencies (e.g. ELF libraries, script's shebang). AT_CHECK > > > + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling > > > + * thread. See securebits.h documentation. > > > + * > > > + * Programs should use this check to apply kernel-level checks against files > > > + * that are not directly executed by the kernel but directly passed to a user > > > + * space interpreter instead. All files that contain executable code, from the > > > + * point of view of the interpreter, should be checked. The main purpose of > > > + * this flag is to improve the security and consistency of an execution > > > + * environment to ensure that direct file execution (e.g. ./script.sh) and > > > + * indirect file execution (e.g. sh script.sh) lead to the same result. For > > > + * instance, this can be used to check if a file is trustworthy according to > > > + * the caller's environment. > > > + * > > > + * In a secure environment, libraries and any executable dependencies should > > > + * also be checked. For instance dynamic linking should make sure that all > > > + * libraries are allowed for execution to avoid trivial bypass (e.g. using > > > + * LD_PRELOAD). For such secure execution environment to make sense, only > > > + * trusted code should be executable, which also requires integrity guarantees. > > > + * > > > + * To avoid race conditions leading to time-of-check to time-of-use issues, > > > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file > > > + * descriptor instead of a path. > > > + */ > > > +#define AT_CHECK 0x10000 > > > + > > > #if defined(__KERNEL__) > > > #define AT_GETATTR_NOSEC 0x80000000 > > > #endif > > > diff --git a/kernel/audit.h b/kernel/audit.h > > > index a60d2840559e..8ebdabd2ab81 100644 > > > --- a/kernel/audit.h > > > +++ b/kernel/audit.h > > > @@ -197,6 +197,7 @@ struct audit_context { > > > struct open_how openat2; > > > struct { > > > int argc; > > > + bool is_check; > > > } execve; > > > struct { > > > char *name; > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > > > index 6f0d6fb6523f..b6316e284342 100644 > > > --- a/kernel/auditsc.c > > > +++ b/kernel/auditsc.c > > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm) > > > > > > context->type = AUDIT_EXECVE; > > > context->execve.argc = bprm->argc; > > > + context->execve.is_check = bprm->is_check; > > > } > > > > > > > > > -- > > > 2.45.2 > > > > >
On Thu, Jul 18, 2024 at 09:02:56AM +0800, Andy Lutomirski wrote: > > On Jul 17, 2024, at 6:01 PM, Mickaël Salaün <mic@digikod.net> wrote: > > > > On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote: > >>> On 17/07/2024 07:33, Jeff Xu wrote: > >>> Consider those cases: I think: > >>> a> relying purely on userspace for enforcement does't seem to be > >>> effective, e.g. it is trivial to call open(), then mmap() it into > >>> executable memory. > >> > >> If there's a way to do this without running executable code that had to pass > >> a previous execveat() check, then yeah, it's not effective (e.g. a Python > >> interpreter that *doesn't* enforce execveat() is a trivial way to do it). > >> > >> Once arbitrary code is running, all bets are off. So long as all arbitrary > >> code is being checked itself, it's allowed to do things that would bypass > >> later checks (and it's up to whoever audited it in the first place to > >> prevent this by not giving it the special mark that allows it to pass the > >> check). > > > > Exactly. As explained in the patches, one crucial prerequisite is that > > the executable code is trusted, and the system must provide integrity > > guarantees. We cannot do anything without that. This patches series is > > a building block to fix a blind spot on Linux systems to be able to > > fully control executability. > > Circling back to my previous comment (did that ever get noticed?), I Yes, I replied to your comments. Did I miss something? > don’t think this is quite right: > > https://lore.kernel.org/all/CALCETrWYu=PYJSgyJ-vaa+3BGAry8Jo8xErZLiGR3U5h6+U0tA@mail.gmail.com/ > > On a basic system configuration, a given path either may or may not be > executed. And maybe that path has some integrity check (dm-verity, > etc). So the kernel should tell the interpreter/loader whether the > target may be executed. All fine. > > But I think the more complex cases are more interesting, and the > “execute a program” process IS NOT BINARY. An attempt to execute can > be rejected outright, or it can be allowed *with a change to creds or > security context*. It would be entirely reasonable to have a policy > that allows execution of non-integrity-checked files but in a very > locked down context only. I guess you mean to transition to a sandbox when executing an untrusted file. This is a good idea. I talked about role transition in the patch's description: With the information that a script interpreter is about to interpret a script, an LSM security policy can adjust caller's access rights or log execution request as for native script execution (e.g. role transition). This is possible thanks to the call to security_bprm_creds_for_exec(). > > So… shouldn’t a patch series to this effect actually support this? > This patch series brings the minimal building blocks to have a consistent execution environment. Role transitions for script execution are left to LSMs. For instance, we could extend Landlock to automatically sandbox untrusted scripts.
On Wed, Jul 17, 2024 at 06:51:11PM -0700, Jeff Xu wrote: > On Wed, Jul 17, 2024 at 3:00 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote: > > > On 17/07/2024 07:33, Jeff Xu wrote: > > > > Consider those cases: I think: > > > > a> relying purely on userspace for enforcement does't seem to be > > > > effective, e.g. it is trivial to call open(), then mmap() it into > > > > executable memory. > > > > > > If there's a way to do this without running executable code that had to pass > > > a previous execveat() check, then yeah, it's not effective (e.g. a Python > > > interpreter that *doesn't* enforce execveat() is a trivial way to do it). > > > > > > Once arbitrary code is running, all bets are off. So long as all arbitrary > > > code is being checked itself, it's allowed to do things that would bypass > > > later checks (and it's up to whoever audited it in the first place to > > > prevent this by not giving it the special mark that allows it to pass the > > > check). > > > We will want to define what is considered as "arbitrary code is running" > > Using an example of ROP, attackers change the return address in stack, > e.g. direct the execution flow to a gauge to call "ld.so /tmp/a.out", > do you consider "arbitrary code is running" when stack is overwritten > ? or after execve() is called. Yes, ROP is arbitrary code execution (which can be mitigated with CFI). ROP could be enough to interpret custom commands and create a small interpreter/VM. > If it is later, this patch can prevent "ld.so /tmp/a.out". > > > Exactly. As explained in the patches, one crucial prerequisite is that > > the executable code is trusted, and the system must provide integrity > > guarantees. We cannot do anything without that. This patches series is > > a building block to fix a blind spot on Linux systems to be able to > > fully control executability. > > Even trusted executable can have a bug. Definitely, but this patch series is dedicated to script execution control. > > I'm thinking in the context of ChromeOS, where all its system services > are from trusted partitions, and legit code won't load .so from a > non-exec mount. But we want to sandbox those services, so even under > some kind of ROP attack, the service still won't be able to load .so > from /tmp. Of course, if an attacker can already write arbitrary > length of data into the stack, it is probably already a game over. > OK, you want to tie executable file permission to mmap. That makes sense if you have a consistent execution model. This can be enforced by LSMs. Contrary to script interpretation which is a full user space implementation (and then controlled by user space), mmap restrictions should indeed be enforced by the kernel.
On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote: > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote: > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > > > allowed for execution. The main use case is for script interpreters and > > > > dynamic linkers to check execution permission according to the kernel's > > > > security policy. Another use case is to add context to access logs e.g., > > > > which script (instead of interpreter) accessed a file. As any > > > > executable code, scripts could also use this check [1]. > > > > > > > > This is different than faccessat(2) which only checks file access > > > > rights, but not the full context e.g. mount point's noexec, stack limit, > > > > and all potential LSM extra checks (e.g. argv, envp, credentials). > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a > > > > real execution, user space gets the same error codes. > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the > > > exec, shared object, script and config file (such as seccomp config), > > > > "config file" that contains executable code. > > > Is seccomp config considered as "contains executable code", seccomp > config is translated into bpf, so maybe yes ? but bpf is running in > the kernel. Because seccomp filters alter syscalls, they are similar to code injection. > > > > I'm still thinking execveat(AT_CHECK) vs faccessat(AT_CHECK) in > > > different use cases: > > > > > > execveat clearly has less code change, but that also means: we can't > > > add logic specific to exec (i.e. logic that can't be applied to > > > config) for this part (from do_execveat_common to > > > security_bprm_creds_for_exec) in future. This would require some > > > agreement/sign-off, I'm not sure from whom. > > > > I'm not sure to follow. We could still add new flags, but for now I > > don't see use cases. This patch series is not meant to handle all > > possible "trust checks", only executable code, which makes sense for the > > kernel. > > > I guess the "configfile" discussion is where I get confused, at one > point, I think this would become a generic "trust checks" api for > everything related to "generating executable code", e.g. javascript, > java code, and more. > We will want to clearly define the scope of execveat(AT_CHECK) The line between data and code is blurry. For instance, a configuration file can impact the execution flow of a program. So, where to draw the line? It might makes sense to follow the kernel and interpreter semantic: if a file can be executed by the kernel (e.g. ELF binary, file containing a shebang, or just configured with binfmt_misc), then this should be considered as executable code. This applies to Bash, Python, Javascript, NodeJS, PE, PHP... However, we can also make a picture executable with binfmt_misc. So, again, where to draw the line? I'd recommend to think about interaction with the outside, through function calls, IPCs, syscalls... For instance, "running" an image should not lead to reading or writing to arbitrary files, or accessing the network, but in practice it is legitimate for some file formats... PostScript is a programming language, but mostly used to draw pictures. So, again, where to draw the line? We should follow the principle of least astonishment. What most users would expect? This should follow the *common usage* of executable files. At the end, the script interpreters will be patched by security folks for security reasons. I think the right question to ask should be: could this file format be (ab)used to leak or modify arbitrary files, or to perform arbitrary syscalls? If the answer is yes, then it should be checked for executability. Of course, this excludes bugs exploited in the file format parser. I'll extend the next patch series with this rationale. > > > If we want other checks, we'll need to clearly define their semantic and > > align with the kernel. faccessat2(2) might be used to check other file > > properties, but the executable property is not only defined by the file > > attributes. > > > Agreed. > > > > > > > -------------------------- > > > now looked at user cases (focus on elf for now) > > > > > > 1> ld.so /tmp/a.out, /tmp/a.out is on non-exec mount > > > dynamic linker will first call execveat(fd, AT_CHECK) then execveat(fd) > > > > > > 2> execve(/usr/bin/some.out) and some.out has dependency on /tmp/a.so > > > /usr/bin/some.out will pass AT_CHECK > > > > > > 3> execve(usr/bin/some.out) and some.out uses custom /tmp/ld.so > > > /usr/bin/some.out will pass AT_CHECK, however, it uses a custom > > > /tmp/ld.so (I assume this is possible for elf header will set the > > > path for ld.so because kernel has no knowledge of that, and > > > binfmt_elf.c allocate memory for ld.so during execveat call) > > > > > > 4> dlopen(/tmp/a.so) > > > I assume dynamic linker will call execveat(AT_CHECK), before map a.so > > > into memory. > > > > > > For case 1> > > > Alternative solution: Because AT_CHECK is always called, I think we > > > can avoid the first AT_CHECK call, and check during execveat(fd), > > > > There is no need to use AT_CHECK if we're going to call execveat(2) on > > the same file descriptor. By design, AT_CHECK is implicit for any > > execve(2). > > > Yes. I realized I was wrong to say that ld.so will call execve() for > /tmp/a.out, there is no execve() call, otherwise it would have been > blocked already today. > The ld.so will mmap the /tmp/a.out directly. So case 1 is no > different than case 2 and 4. ( the elf objects are mapped to memory > by dynamic linker.) > I'm not familiar with dynamic linker, Florian is on this thread, and > can help to correct me if my guess is wrong. > > > > this means the kernel will enforce SECBIT_EXEC_RESTRICT_FILE = 1, the > > > benefit is that there is no TOCTOU and save one round trip of syscall > > > for a succesful execveat() case. > > > > As long as user space uses the same file descriptor, there is no TOCTOU. > > > > SECBIT_EXEC_RESTRICT_FILE only makes sense for user space: it defines > > the user space security policy. The kernel already enforces the same > > security policy for any execve(2), whatever are the calling process's > > securebits. > > > > > > > > For case 2> > > > dynamic linker will call execve(AT_CHECK), then mmap(fd) into memory. > > > However, the process can all open then mmap() directly, it seems > > > minimal effort for an attacker to walk around such a defence from > > > dynamic linker. > > > > Which process? What do you mean by "can all open then mmap() directly"? > > > > In this context the dynamic linker (like its parent processes) is > > trusted (guaranteed by the system). > > > > For case 2, the dynamic linker must check with AT_CHECK all files that > > will be mapped, which include /usr/bin/some.out and /tmp/a.so > > > My point is that the process can work around this by mmap() the file directly. Yes, see my answer in the other email. The process is trusted. > > > > > > > Alternative solution: > > > dynamic linker call AT_CHECK for each .so, kernel will save the state > > > (associated with fd) > > > kernel will check fd state at the time of mmap(fd, executable memory) > > > and enforce SECBIT_EXEC_RESTRICT_FILE = 1 > > > > The idea with AT_CHECK is that there is no kernel side effect, no extra > > kernel state, and the semantic is the same as with execve(2). > > > > This also enables us to check file's executable permission and ignore > > it, which is useful in a "permissive mode" when preparing for a > > migration without breaking a system, or to do extra integrity checks. > For preparing a migration (detect all violations), this is useful. > But as a defense mechanism (SECBIT_EXEC_RESTRICT_FILE = 1) , this > seems to be weak, at least for elf loading case. We could add more restrictions, but that is outside the scope of this patch series. > > > BTW, this use case would also be more complex with a new openat2(2) flag > > like the original O_MAYEXEC. > > > > > > > > Alternative solution 2: > > > a new syscall to load the .so and enforce the AT_CHECK in kernel > > > > A new syscall would be overkill for this feature. Please see Linus's > > comment. > > > maybe, I was thinking on how to prevent "/tmp/a.o" from getting mmap() > to executable memory. OK, this is another story. > > > > > > > This also means, for the solution to be complete, we might want to > > > block creation of executable anonymous memory (e.g. by seccomp, ), > > > > How seccomp could create anonymous memory in user space? > > seccomp filters should be treated (and checked with AT_CHECK) as > > executable code anyway. > > > > > unless the user space can harden the creation of executable anonymous > > > memory in some way. > > > > User space is already in charge of mmapping its own memory. I don't see > > what is missing. > > > > > > > > For case 3> > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make > > > sure it passes AT_CHECK, before loading it into memory. > > > > All ELF dependencies are opened and checked with open_exec(), which > > perform the main executability checks (with the __FMODE_EXEC flag). > > Did I miss something? > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel. > The app can choose its own dynamic linker path during build, (maybe > even statically link one ?) This is another reason that relying on a > userspace only is not enough. The kernel calls open_exec() on all dependencies, including ld-linux-x86-64.so.2, so these files are checked for executability too. > > > However, we must be careful with programs using the (deprecated) > > uselib(2). They should also check with AT_CHECK because this syscall > > opens the shared library without __FMODE_EXEC (similar to a simple file > > open). See > > https://lore.kernel.org/all/CAHk-=wiUwRG7LuR=z5sbkFVGQh+7qVB6_1NM0Ny9SVNL1Un4Sw@mail.gmail.com/ > > > > > > > > For case 4> > > > same as case 2. > > > > > > Consider those cases: I think: > > > a> relying purely on userspace for enforcement does't seem to be > > > effective, e.g. it is trivial to call open(), then mmap() it into > > > executable memory. > > > > As Steve explained (and is also explained in the patches), it is trivial > > if the attacker can already execute its own code, which is too late to > > enforce any script execution control. > > > > > b> if both user space and kernel need to call AT_CHECK, the faccessat > > > seems to be a better place for AT_CHECK, e.g. kernel can call > > > do_faccessat(AT_CHECK) and userspace can call faccessat(). This will > > > avoid complicating the execveat() code path. > > > > A previous version of this patches series already patched faccessat(2), > > but this is not the right place. faccessat2(2) is dedicated to check > > file permissions, not executability (e.g. with mount's noexec). > > > > > > > > What do you think ? > > > > I think there are some misunderstandings. Please let me know if it's > > clearer now. > > > I'm still not sure about the user case for dynamic linker (elf > loading) case. Maybe this patch is more suitable for scripts? It's suitable for both, but we could add more restriction on mmap with an (existing) LSM. The kernel already checks for mount's noexec when mapping a file, but not for the file permission, which is OK because it could be bypassed by coping the content of the file and mprotecting it anyway. For a consistent memory execution control, all memory mapping need to be restricted, which is out of scope for this patch series. > A detailed user case will help demonstrate the use case for dynamic > linker, e.g. what kind of app will benefit from > SECBIT_EXEC_RESTRICT_FILE = 1, what kind of threat model are we > dealing with , what kind of attack chain we blocked as a result. I explained that in the patches and in the description of these new securebits. Please point which part is not clear. The full threat model is simple: the TCB includes the kernel and system's files, which are integrity-protected, but we don't trust arbitrary data/scripts that can be written to user-owned files or directly provided to script interpreters. As for the ptrace restrictions, the dynamic linker restrictions helps to avoid trivial bypasses (e.g. with LD_PRELOAD) with consistent executability checks. > > > > > > > Thanks > > > -Jeff > > > > > > > With the information that a script interpreter is about to interpret a > > > > script, an LSM security policy can adjust caller's access rights or log > > > > execution request as for native script execution (e.g. role transition). > > > > This is possible thanks to the call to security_bprm_creds_for_exec(). > > > > > > > > Because LSMs may only change bprm's credentials, use of AT_CHECK with > > > > current kernel code should not be a security issue (e.g. unexpected role > > > > transition). LSMs willing to update the caller's credential could now > > > > do so when bprm->is_check is set. Of course, such policy change should > > > > be in line with the new user space code. > > > > > > > > Because AT_CHECK is dedicated to user space interpreters, it doesn't > > > > make sense for the kernel to parse the checked files, look for > > > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC > > > > if the format is unknown. Because of that, security_bprm_check() is > > > > never called when AT_CHECK is used. > > > > > > > > It should be noted that script interpreters cannot directly use > > > > execveat(2) (without this new AT_CHECK flag) because this could lead to > > > > unexpected behaviors e.g., `python script.sh` could lead to Bash being > > > > executed to interpret the script. Unlike the kernel, script > > > > interpreters may just interpret the shebang as a simple comment, which > > > > should not change for backward compatibility reasons. > > > > > > > > Because scripts or libraries files might not currently have the > > > > executable permission set, or because we might want specific users to be > > > > allowed to run arbitrary scripts, the following patch provides a dynamic > > > > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and > > > > SECBIT_SHOULD_EXEC_RESTRICT securebits. > > > > > > > > This is a redesign of the CLIP OS 4's O_MAYEXEC: > > > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch > > > > This patch has been used for more than a decade with customized script > > > > interpreters. Some examples can be found here: > > > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC > > > > > > > > Cc: Al Viro <viro@zeniv.linux.org.uk> > > > > Cc: Christian Brauner <brauner@kernel.org> > > > > Cc: Kees Cook <keescook@chromium.org> > > > > Cc: Paul Moore <paul@paul-moore.com> > > > > Link: https://docs.python.org/3/library/io.html#io.open_code [1] > > > > Signed-off-by: Mickaël Salaün <mic@digikod.net> > > > > Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net > > > > --- > > > > > > > > New design since v18: > > > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net > > > > --- > > > > fs/exec.c | 5 +++-- > > > > include/linux/binfmts.h | 7 ++++++- > > > > include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++ > > > > kernel/audit.h | 1 + > > > > kernel/auditsc.c | 1 + > > > > 5 files changed, 41 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/fs/exec.c b/fs/exec.c > > > > index 40073142288f..ea2a1867afdc 100644 > > > > --- a/fs/exec.c > > > > +++ b/fs/exec.c > > > > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags) > > > > .lookup_flags = LOOKUP_FOLLOW, > > > > }; > > > > > > > > - if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0) > > > > + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0) > > > > return ERR_PTR(-EINVAL); > > > > if (flags & AT_SYMLINK_NOFOLLOW) > > > > open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW; > > > > @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl > > > > bprm->filename = bprm->fdpath; > > > > } > > > > bprm->interp = bprm->filename; > > > > + bprm->is_check = !!(flags & AT_CHECK); > > > > > > > > retval = bprm_mm_init(bprm); > > > > if (!retval) > > > > @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm) > > > > > > > > /* Set the unchanging part of bprm->cred */ > > > > retval = security_bprm_creds_for_exec(bprm); > > > > - if (retval) > > > > + if (retval || bprm->is_check) > > > > goto out; > > > > > > > > retval = exec_binprm(bprm); > > > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > > > > index 70f97f685bff..8ff9c9e33aed 100644 > > > > --- a/include/linux/binfmts.h > > > > +++ b/include/linux/binfmts.h > > > > @@ -42,7 +42,12 @@ struct linux_binprm { > > > > * Set when errors can no longer be returned to the > > > > * original userspace. > > > > */ > > > > - point_of_no_return:1; > > > > + point_of_no_return:1, > > > > + /* > > > > + * Set by user space to check executability according to the > > > > + * caller's environment. > > > > + */ > > > > + is_check:1; > > > > struct file *executable; /* Executable to pass to the interpreter */ > > > > struct file *interpreter; > > > > struct file *file; > > > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h > > > > index c0bcc185fa48..bcd05c59b7df 100644 > > > > --- a/include/uapi/linux/fcntl.h > > > > +++ b/include/uapi/linux/fcntl.h > > > > @@ -118,6 +118,36 @@ > > > > #define AT_HANDLE_FID AT_REMOVEDIR /* file handle is needed to > > > > compare object identity and may not > > > > be usable to open_by_handle_at(2) */ > > > > + > > > > +/* > > > > + * AT_CHECK only performs a check on a regular file and returns 0 if execution > > > > + * of this file would be allowed, ignoring the file format and then the related > > > > + * interpreter dependencies (e.g. ELF libraries, script's shebang). AT_CHECK > > > > + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling > > > > + * thread. See securebits.h documentation. > > > > + * > > > > + * Programs should use this check to apply kernel-level checks against files > > > > + * that are not directly executed by the kernel but directly passed to a user > > > > + * space interpreter instead. All files that contain executable code, from the > > > > + * point of view of the interpreter, should be checked. The main purpose of > > > > + * this flag is to improve the security and consistency of an execution > > > > + * environment to ensure that direct file execution (e.g. ./script.sh) and > > > > + * indirect file execution (e.g. sh script.sh) lead to the same result. For > > > > + * instance, this can be used to check if a file is trustworthy according to > > > > + * the caller's environment. > > > > + * > > > > + * In a secure environment, libraries and any executable dependencies should > > > > + * also be checked. For instance dynamic linking should make sure that all > > > > + * libraries are allowed for execution to avoid trivial bypass (e.g. using > > > > + * LD_PRELOAD). For such secure execution environment to make sense, only > > > > + * trusted code should be executable, which also requires integrity guarantees. > > > > + * > > > > + * To avoid race conditions leading to time-of-check to time-of-use issues, > > > > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file > > > > + * descriptor instead of a path. > > > > + */ > > > > +#define AT_CHECK 0x10000 > > > > + > > > > #if defined(__KERNEL__) > > > > #define AT_GETATTR_NOSEC 0x80000000 > > > > #endif > > > > diff --git a/kernel/audit.h b/kernel/audit.h > > > > index a60d2840559e..8ebdabd2ab81 100644 > > > > --- a/kernel/audit.h > > > > +++ b/kernel/audit.h > > > > @@ -197,6 +197,7 @@ struct audit_context { > > > > struct open_how openat2; > > > > struct { > > > > int argc; > > > > + bool is_check; > > > > } execve; > > > > struct { > > > > char *name; > > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > > > > index 6f0d6fb6523f..b6316e284342 100644 > > > > --- a/kernel/auditsc.c > > > > +++ b/kernel/auditsc.c > > > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm) > > > > > > > > context->type = AUDIT_EXECVE; > > > > context->execve.argc = bprm->argc; > > > > + context->execve.is_check = bprm->is_check; > > > > } > > > > > > > > > > > > -- > > > > 2.45.2 > > > > > > > >
On Thu, 2024-07-18 at 14:24 +0200, Mickaël Salaün wrote: > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote: > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> > > wrote: > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote: [...] > > > > I'm still thinking execveat(AT_CHECK) vs faccessat(AT_CHECK) > > > > in different use cases: > > > > > > > > execveat clearly has less code change, but that also means: we > > > > can't add logic specific to exec (i.e. logic that can't be > > > > applied to config) for this part (from do_execveat_common to > > > > security_bprm_creds_for_exec) in future. This would require > > > > some agreement/sign-off, I'm not sure from whom. > > > > > > I'm not sure to follow. We could still add new flags, but for now > > > I don't see use cases. This patch series is not meant to handle > > > all possible "trust checks", only executable code, which makes > > > sense for the kernel. > > > > > I guess the "configfile" discussion is where I get confused, at one > > point, I think this would become a generic "trust checks" api for > > everything related to "generating executable code", e.g. > > javascript, java code, and more. We will want to clearly define the > > scope of execveat(AT_CHECK) > > The line between data and code is blurry. For instance, a > configuration file can impact the execution flow of a program. So, > where to draw the line? Having a way to have config files part of the trusted envelope, either by signing or measurement would be really useful. The current standard distro IMA deployment is signed executables, but not signed config because it's hard to construct a policy that doesn't force the signing of too many extraneous files (and files which might change often). > It might makes sense to follow the kernel and interpreter semantic: > if a file can be executed by the kernel (e.g. ELF binary, file > containing a shebang, or just configured with binfmt_misc), then this > should be considered as executable code. This applies to Bash, > Python, Javascript, NodeJS, PE, PHP... However, we can also make a > picture executable with binfmt_misc. So, again, where to draw the > line? Possibly by making open for config an indication executables can give? I'm not advocating doing it in this patch, but if we had an open for config indication, the LSMs could do much finer grained policy, especially if they knew which executable was trying to open the config file. It would allow things like an IMA policy saying if a signed executable is opening a config file, then that file must also be signed. James
On Wed, Jul 17, 2024 at 10:08 PM Jeff Xu <jeffxu@google.com> wrote: > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote: > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > > > allowed for execution. The main use case is for script interpreters and > > > > dynamic linkers to check execution permission according to the kernel's > > > > security policy. Another use case is to add context to access logs e.g., > > > > which script (instead of interpreter) accessed a file. As any > > > > executable code, scripts could also use this check [1]. > > > > > > > > This is different than faccessat(2) which only checks file access > > > > rights, but not the full context e.g. mount point's noexec, stack limit, > > > > and all potential LSM extra checks (e.g. argv, envp, credentials). > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a > > > > real execution, user space gets the same error codes. > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the > > > exec, shared object, script and config file (such as seccomp config), > > > > "config file" that contains executable code. > > > Is seccomp config considered as "contains executable code", seccomp > config is translated into bpf, so maybe yes ? but bpf is running in > the kernel. > > > > I'm still thinking execveat(AT_CHECK) vs faccessat(AT_CHECK) in > > > different use cases: > > > > > > execveat clearly has less code change, but that also means: we can't > > > add logic specific to exec (i.e. logic that can't be applied to > > > config) for this part (from do_execveat_common to > > > security_bprm_creds_for_exec) in future. This would require some > > > agreement/sign-off, I'm not sure from whom. > > > > I'm not sure to follow. We could still add new flags, but for now I > > don't see use cases. This patch series is not meant to handle all > > possible "trust checks", only executable code, which makes sense for the > > kernel. > > > I guess the "configfile" discussion is where I get confused, at one > point, I think this would become a generic "trust checks" api for > everything related to "generating executable code", e.g. javascript, > java code, and more. > We will want to clearly define the scope of execveat(AT_CHECK) > > > If we want other checks, we'll need to clearly define their semantic and > > align with the kernel. faccessat2(2) might be used to check other file > > properties, but the executable property is not only defined by the file > > attributes. > > > Agreed. > > > > > > > -------------------------- > > > now looked at user cases (focus on elf for now) > > > > > > 1> ld.so /tmp/a.out, /tmp/a.out is on non-exec mount > > > dynamic linker will first call execveat(fd, AT_CHECK) then execveat(fd) > > > > > > 2> execve(/usr/bin/some.out) and some.out has dependency on /tmp/a.so > > > /usr/bin/some.out will pass AT_CHECK > > > > > > 3> execve(usr/bin/some.out) and some.out uses custom /tmp/ld.so > > > /usr/bin/some.out will pass AT_CHECK, however, it uses a custom > > > /tmp/ld.so (I assume this is possible for elf header will set the > > > path for ld.so because kernel has no knowledge of that, and > > > binfmt_elf.c allocate memory for ld.so during execveat call) > > > > > > 4> dlopen(/tmp/a.so) > > > I assume dynamic linker will call execveat(AT_CHECK), before map a.so > > > into memory. > > > > > > For case 1> > > > Alternative solution: Because AT_CHECK is always called, I think we > > > can avoid the first AT_CHECK call, and check during execveat(fd), > > > > There is no need to use AT_CHECK if we're going to call execveat(2) on > > the same file descriptor. By design, AT_CHECK is implicit for any > > execve(2). > > > Yes. I realized I was wrong to say that ld.so will call execve() for > /tmp/a.out, there is no execve() call, otherwise it would have been > blocked already today. > The ld.so will mmap the /tmp/a.out directly. So case 1 is no > different than case 2 and 4. ( the elf objects are mapped to memory > by dynamic linker.) > I'm not familiar with dynamic linker, Florian is on this thread, and > can help to correct me if my guess is wrong. for Android, this has been the nail in the coffin of previous attempts to disallow running code from non-trusted filesystems --- instead of execing /tmp/a.out, the attacker just execs the linker with /tmp/a.out as an argument. people are doing this already in some cases, because we already have ineffectual "barriers" in place. [the usual argument for doing such things anyway is "it makes it harder to be doing this by _accident_".] the other workaround for the attacker is to copy and paste the entire dynamic linker source and change the bits they don't like :-) (if you're thinking "is that a thing?", yes, so much so that the idea has been independently reinvented multiple times by several major legit apps and by basically every piece of DRM middleware. which is why -- although i'm excited by mseal(2) -- i expect to face significant challenges rolling it out in Android _especially_ in places like "dynamic linker internal data structures" where i've wanted it for years!) this proposal feels like it _ought_ to let a defender tighten their seccomp filter to require a "safe" fd if i'm using mmap() with an fd, but in practice as long as JITs exist i can always just copy code into a non-fd-backed mmap() region. and -- from the perspective of Android, where all "apps" are code loaded into a Java runtime -- there's not much getting away from JITs. (and last i looked, ART -- Android's Java runtime -- uses memfd() for the JIT cache.) > > > this means the kernel will enforce SECBIT_EXEC_RESTRICT_FILE = 1, the > > > benefit is that there is no TOCTOU and save one round trip of syscall > > > for a succesful execveat() case. > > > > As long as user space uses the same file descriptor, there is no TOCTOU. > > > > SECBIT_EXEC_RESTRICT_FILE only makes sense for user space: it defines > > the user space security policy. The kernel already enforces the same > > security policy for any execve(2), whatever are the calling process's > > securebits. > > > > > > > > For case 2> > > > dynamic linker will call execve(AT_CHECK), then mmap(fd) into memory. > > > However, the process can all open then mmap() directly, it seems > > > minimal effort for an attacker to walk around such a defence from > > > dynamic linker. > > > > Which process? What do you mean by "can all open then mmap() directly"? > > > > In this context the dynamic linker (like its parent processes) is > > trusted (guaranteed by the system). > > > > For case 2, the dynamic linker must check with AT_CHECK all files that > > will be mapped, which include /usr/bin/some.out and /tmp/a.so > > > My point is that the process can work around this by mmap() the file directly. > > > > > > > Alternative solution: > > > dynamic linker call AT_CHECK for each .so, kernel will save the state > > > (associated with fd) > > > kernel will check fd state at the time of mmap(fd, executable memory) > > > and enforce SECBIT_EXEC_RESTRICT_FILE = 1 > > > > The idea with AT_CHECK is that there is no kernel side effect, no extra > > kernel state, and the semantic is the same as with execve(2). > > > > This also enables us to check file's executable permission and ignore > > it, which is useful in a "permissive mode" when preparing for a > > migration without breaking a system, or to do extra integrity checks. > For preparing a migration (detect all violations), this is useful. > But as a defense mechanism (SECBIT_EXEC_RESTRICT_FILE = 1) , this > seems to be weak, at least for elf loading case. > > > BTW, this use case would also be more complex with a new openat2(2) flag > > like the original O_MAYEXEC. > > > > > > > > Alternative solution 2: > > > a new syscall to load the .so and enforce the AT_CHECK in kernel > > > > A new syscall would be overkill for this feature. Please see Linus's > > comment. > > > maybe, I was thinking on how to prevent "/tmp/a.o" from getting mmap() > to executable memory. > > > > > > > This also means, for the solution to be complete, we might want to > > > block creation of executable anonymous memory (e.g. by seccomp, ), > > > > How seccomp could create anonymous memory in user space? > > seccomp filters should be treated (and checked with AT_CHECK) as > > executable code anyway. > > > > > unless the user space can harden the creation of executable anonymous > > > memory in some way. > > > > User space is already in charge of mmapping its own memory. I don't see > > what is missing. > > > > > > > > For case 3> > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make > > > sure it passes AT_CHECK, before loading it into memory. > > > > All ELF dependencies are opened and checked with open_exec(), which > > perform the main executability checks (with the __FMODE_EXEC flag). > > Did I miss something? > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel. > The app can choose its own dynamic linker path during build, (maybe > even statically link one ?) This is another reason that relying on a > userspace only is not enough. > > > However, we must be careful with programs using the (deprecated) > > uselib(2). They should also check with AT_CHECK because this syscall > > opens the shared library without __FMODE_EXEC (similar to a simple file > > open). See > > https://lore.kernel.org/all/CAHk-=wiUwRG7LuR=z5sbkFVGQh+7qVB6_1NM0Ny9SVNL1Un4Sw@mail.gmail.com/ > > > > > > > > For case 4> > > > same as case 2. > > > > > > Consider those cases: I think: > > > a> relying purely on userspace for enforcement does't seem to be > > > effective, e.g. it is trivial to call open(), then mmap() it into > > > executable memory. > > > > As Steve explained (and is also explained in the patches), it is trivial > > if the attacker can already execute its own code, which is too late to > > enforce any script execution control. > > > > > b> if both user space and kernel need to call AT_CHECK, the faccessat > > > seems to be a better place for AT_CHECK, e.g. kernel can call > > > do_faccessat(AT_CHECK) and userspace can call faccessat(). This will > > > avoid complicating the execveat() code path. > > > > A previous version of this patches series already patched faccessat(2), > > but this is not the right place. faccessat2(2) is dedicated to check > > file permissions, not executability (e.g. with mount's noexec). > > > > > > > > What do you think ? > > > > I think there are some misunderstandings. Please let me know if it's > > clearer now. > > > I'm still not sure about the user case for dynamic linker (elf > loading) case. Maybe this patch is more suitable for scripts? > A detailed user case will help demonstrate the use case for dynamic > linker, e.g. what kind of app will benefit from > SECBIT_EXEC_RESTRICT_FILE = 1, what kind of threat model are we > dealing with , what kind of attack chain we blocked as a result. > > > > > > > Thanks > > > -Jeff > > > > > > > With the information that a script interpreter is about to interpret a > > > > script, an LSM security policy can adjust caller's access rights or log > > > > execution request as for native script execution (e.g. role transition). > > > > This is possible thanks to the call to security_bprm_creds_for_exec(). > > > > > > > > Because LSMs may only change bprm's credentials, use of AT_CHECK with > > > > current kernel code should not be a security issue (e.g. unexpected role > > > > transition). LSMs willing to update the caller's credential could now > > > > do so when bprm->is_check is set. Of course, such policy change should > > > > be in line with the new user space code. > > > > > > > > Because AT_CHECK is dedicated to user space interpreters, it doesn't > > > > make sense for the kernel to parse the checked files, look for > > > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC > > > > if the format is unknown. Because of that, security_bprm_check() is > > > > never called when AT_CHECK is used. > > > > > > > > It should be noted that script interpreters cannot directly use > > > > execveat(2) (without this new AT_CHECK flag) because this could lead to > > > > unexpected behaviors e.g., `python script.sh` could lead to Bash being > > > > executed to interpret the script. Unlike the kernel, script > > > > interpreters may just interpret the shebang as a simple comment, which > > > > should not change for backward compatibility reasons. > > > > > > > > Because scripts or libraries files might not currently have the > > > > executable permission set, or because we might want specific users to be > > > > allowed to run arbitrary scripts, the following patch provides a dynamic > > > > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and > > > > SECBIT_SHOULD_EXEC_RESTRICT securebits. > > > > > > > > This is a redesign of the CLIP OS 4's O_MAYEXEC: > > > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch > > > > This patch has been used for more than a decade with customized script > > > > interpreters. Some examples can be found here: > > > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC > > > > > > > > Cc: Al Viro <viro@zeniv.linux.org.uk> > > > > Cc: Christian Brauner <brauner@kernel.org> > > > > Cc: Kees Cook <keescook@chromium.org> > > > > Cc: Paul Moore <paul@paul-moore.com> > > > > Link: https://docs.python.org/3/library/io.html#io.open_code [1] > > > > Signed-off-by: Mickaël Salaün <mic@digikod.net> > > > > Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net > > > > --- > > > > > > > > New design since v18: > > > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net > > > > --- > > > > fs/exec.c | 5 +++-- > > > > include/linux/binfmts.h | 7 ++++++- > > > > include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++ > > > > kernel/audit.h | 1 + > > > > kernel/auditsc.c | 1 + > > > > 5 files changed, 41 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/fs/exec.c b/fs/exec.c > > > > index 40073142288f..ea2a1867afdc 100644 > > > > --- a/fs/exec.c > > > > +++ b/fs/exec.c > > > > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags) > > > > .lookup_flags = LOOKUP_FOLLOW, > > > > }; > > > > > > > > - if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0) > > > > + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0) > > > > return ERR_PTR(-EINVAL); > > > > if (flags & AT_SYMLINK_NOFOLLOW) > > > > open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW; > > > > @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl > > > > bprm->filename = bprm->fdpath; > > > > } > > > > bprm->interp = bprm->filename; > > > > + bprm->is_check = !!(flags & AT_CHECK); > > > > > > > > retval = bprm_mm_init(bprm); > > > > if (!retval) > > > > @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm) > > > > > > > > /* Set the unchanging part of bprm->cred */ > > > > retval = security_bprm_creds_for_exec(bprm); > > > > - if (retval) > > > > + if (retval || bprm->is_check) > > > > goto out; > > > > > > > > retval = exec_binprm(bprm); > > > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > > > > index 70f97f685bff..8ff9c9e33aed 100644 > > > > --- a/include/linux/binfmts.h > > > > +++ b/include/linux/binfmts.h > > > > @@ -42,7 +42,12 @@ struct linux_binprm { > > > > * Set when errors can no longer be returned to the > > > > * original userspace. > > > > */ > > > > - point_of_no_return:1; > > > > + point_of_no_return:1, > > > > + /* > > > > + * Set by user space to check executability according to the > > > > + * caller's environment. > > > > + */ > > > > + is_check:1; > > > > struct file *executable; /* Executable to pass to the interpreter */ > > > > struct file *interpreter; > > > > struct file *file; > > > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h > > > > index c0bcc185fa48..bcd05c59b7df 100644 > > > > --- a/include/uapi/linux/fcntl.h > > > > +++ b/include/uapi/linux/fcntl.h > > > > @@ -118,6 +118,36 @@ > > > > #define AT_HANDLE_FID AT_REMOVEDIR /* file handle is needed to > > > > compare object identity and may not > > > > be usable to open_by_handle_at(2) */ > > > > + > > > > +/* > > > > + * AT_CHECK only performs a check on a regular file and returns 0 if execution > > > > + * of this file would be allowed, ignoring the file format and then the related > > > > + * interpreter dependencies (e.g. ELF libraries, script's shebang). AT_CHECK > > > > + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling > > > > + * thread. See securebits.h documentation. > > > > + * > > > > + * Programs should use this check to apply kernel-level checks against files > > > > + * that are not directly executed by the kernel but directly passed to a user > > > > + * space interpreter instead. All files that contain executable code, from the > > > > + * point of view of the interpreter, should be checked. The main purpose of > > > > + * this flag is to improve the security and consistency of an execution > > > > + * environment to ensure that direct file execution (e.g. ./script.sh) and > > > > + * indirect file execution (e.g. sh script.sh) lead to the same result. For > > > > + * instance, this can be used to check if a file is trustworthy according to > > > > + * the caller's environment. > > > > + * > > > > + * In a secure environment, libraries and any executable dependencies should > > > > + * also be checked. For instance dynamic linking should make sure that all > > > > + * libraries are allowed for execution to avoid trivial bypass (e.g. using > > > > + * LD_PRELOAD). For such secure execution environment to make sense, only > > > > + * trusted code should be executable, which also requires integrity guarantees. > > > > + * > > > > + * To avoid race conditions leading to time-of-check to time-of-use issues, > > > > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file > > > > + * descriptor instead of a path. > > > > + */ > > > > +#define AT_CHECK 0x10000 > > > > + > > > > #if defined(__KERNEL__) > > > > #define AT_GETATTR_NOSEC 0x80000000 > > > > #endif > > > > diff --git a/kernel/audit.h b/kernel/audit.h > > > > index a60d2840559e..8ebdabd2ab81 100644 > > > > --- a/kernel/audit.h > > > > +++ b/kernel/audit.h > > > > @@ -197,6 +197,7 @@ struct audit_context { > > > > struct open_how openat2; > > > > struct { > > > > int argc; > > > > + bool is_check; > > > > } execve; > > > > struct { > > > > char *name; > > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > > > > index 6f0d6fb6523f..b6316e284342 100644 > > > > --- a/kernel/auditsc.c > > > > +++ b/kernel/auditsc.c > > > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm) > > > > > > > > context->type = AUDIT_EXECVE; > > > > context->execve.argc = bprm->argc; > > > > + context->execve.is_check = bprm->is_check; > > > > } > > > > > > > > > > > > -- > > > > 2.45.2 > > > > > > >
On Thu, Jul 18, 2024 at 09:03:36AM -0400, James Bottomley wrote: > On Thu, 2024-07-18 at 14:24 +0200, Mickaël Salaün wrote: > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote: > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> > > > wrote: > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote: > [...] > > > > > I'm still thinking execveat(AT_CHECK) vs faccessat(AT_CHECK) > > > > > in different use cases: > > > > > > > > > > execveat clearly has less code change, but that also means: we > > > > > can't add logic specific to exec (i.e. logic that can't be > > > > > applied to config) for this part (from do_execveat_common to > > > > > security_bprm_creds_for_exec) in future. This would require > > > > > some agreement/sign-off, I'm not sure from whom. > > > > > > > > I'm not sure to follow. We could still add new flags, but for now > > > > I don't see use cases. This patch series is not meant to handle > > > > all possible "trust checks", only executable code, which makes > > > > sense for the kernel. > > > > > > > I guess the "configfile" discussion is where I get confused, at one > > > point, I think this would become a generic "trust checks" api for > > > everything related to "generating executable code", e.g. > > > javascript, java code, and more. We will want to clearly define the > > > scope of execveat(AT_CHECK) > > > > The line between data and code is blurry. For instance, a > > configuration file can impact the execution flow of a program. So, > > where to draw the line? > > Having a way to have config files part of the trusted envelope, either > by signing or measurement would be really useful. The current standard > distro IMA deployment is signed executables, but not signed config > because it's hard to construct a policy that doesn't force the signing > of too many extraneous files (and files which might change often). > > > It might makes sense to follow the kernel and interpreter semantic: > > if a file can be executed by the kernel (e.g. ELF binary, file > > containing a shebang, or just configured with binfmt_misc), then this > > should be considered as executable code. This applies to Bash, > > Python, Javascript, NodeJS, PE, PHP... However, we can also make a > > picture executable with binfmt_misc. So, again, where to draw the > > line? > > Possibly by making open for config an indication executables can give? > I'm not advocating doing it in this patch, but if we had an open for > config indication, the LSMs could do much finer grained policy, > especially if they knew which executable was trying to open the config > file. It would allow things like an IMA policy saying if a signed > executable is opening a config file, then that file must also be > signed. Checking configuration could be a next step, but not with this patch series. FYI, the previous version was a (too) generic syscall: https://lore.kernel.org/all/20220104155024.48023-1-mic@digikod.net/ One of the main concern was alignment with kernel semantic. For now, let's focus on script execution control. > > James >
On Thu, Jul 18, 2024 at 10:46:54AM -0400, enh wrote: > On Wed, Jul 17, 2024 at 10:08 PM Jeff Xu <jeffxu@google.com> wrote: > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote: > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > > > > allowed for execution. The main use case is for script interpreters and > > > > > dynamic linkers to check execution permission according to the kernel's > > > > > security policy. Another use case is to add context to access logs e.g., > > > > > which script (instead of interpreter) accessed a file. As any > > > > > executable code, scripts could also use this check [1]. > > > > > > > > > > This is different than faccessat(2) which only checks file access > > > > > rights, but not the full context e.g. mount point's noexec, stack limit, > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials). > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a > > > > > real execution, user space gets the same error codes. > > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the > > > > exec, shared object, script and config file (such as seccomp config), > > > > > > "config file" that contains executable code. > > > > > Is seccomp config considered as "contains executable code", seccomp > > config is translated into bpf, so maybe yes ? but bpf is running in > > the kernel. > > > > > > I'm still thinking execveat(AT_CHECK) vs faccessat(AT_CHECK) in > > > > different use cases: > > > > > > > > execveat clearly has less code change, but that also means: we can't > > > > add logic specific to exec (i.e. logic that can't be applied to > > > > config) for this part (from do_execveat_common to > > > > security_bprm_creds_for_exec) in future. This would require some > > > > agreement/sign-off, I'm not sure from whom. > > > > > > I'm not sure to follow. We could still add new flags, but for now I > > > don't see use cases. This patch series is not meant to handle all > > > possible "trust checks", only executable code, which makes sense for the > > > kernel. > > > > > I guess the "configfile" discussion is where I get confused, at one > > point, I think this would become a generic "trust checks" api for > > everything related to "generating executable code", e.g. javascript, > > java code, and more. > > We will want to clearly define the scope of execveat(AT_CHECK) > > > > > If we want other checks, we'll need to clearly define their semantic and > > > align with the kernel. faccessat2(2) might be used to check other file > > > properties, but the executable property is not only defined by the file > > > attributes. > > > > > Agreed. > > > > > > > > > > -------------------------- > > > > now looked at user cases (focus on elf for now) > > > > > > > > 1> ld.so /tmp/a.out, /tmp/a.out is on non-exec mount > > > > dynamic linker will first call execveat(fd, AT_CHECK) then execveat(fd) > > > > > > > > 2> execve(/usr/bin/some.out) and some.out has dependency on /tmp/a.so > > > > /usr/bin/some.out will pass AT_CHECK > > > > > > > > 3> execve(usr/bin/some.out) and some.out uses custom /tmp/ld.so > > > > /usr/bin/some.out will pass AT_CHECK, however, it uses a custom > > > > /tmp/ld.so (I assume this is possible for elf header will set the > > > > path for ld.so because kernel has no knowledge of that, and > > > > binfmt_elf.c allocate memory for ld.so during execveat call) > > > > > > > > 4> dlopen(/tmp/a.so) > > > > I assume dynamic linker will call execveat(AT_CHECK), before map a.so > > > > into memory. > > > > > > > > For case 1> > > > > Alternative solution: Because AT_CHECK is always called, I think we > > > > can avoid the first AT_CHECK call, and check during execveat(fd), > > > > > > There is no need to use AT_CHECK if we're going to call execveat(2) on > > > the same file descriptor. By design, AT_CHECK is implicit for any > > > execve(2). > > > > > Yes. I realized I was wrong to say that ld.so will call execve() for > > /tmp/a.out, there is no execve() call, otherwise it would have been > > blocked already today. > > The ld.so will mmap the /tmp/a.out directly. So case 1 is no > > different than case 2 and 4. ( the elf objects are mapped to memory > > by dynamic linker.) > > I'm not familiar with dynamic linker, Florian is on this thread, and > > can help to correct me if my guess is wrong. > > for Android, this has been the nail in the coffin of previous attempts > to disallow running code from non-trusted filesystems --- instead of > execing /tmp/a.out, the attacker just execs the linker with /tmp/a.out > as an argument. people are doing this already in some cases, because > we already have ineffectual "barriers" in place. [the usual argument > for doing such things anyway is "it makes it harder to be doing this > by _accident_".] This AT_CHECK and related securebits should cover this case. > > the other workaround for the attacker is to copy and paste the entire > dynamic linker source and change the bits they don't like :-) (if > you're thinking "is that a thing?", yes, so much so that the idea has > been independently reinvented multiple times by several major legit > apps and by basically every piece of DRM middleware. which is why -- > although i'm excited by mseal(2) -- i expect to face significant > challenges rolling it out in Android _especially_ in places like > "dynamic linker internal data structures" where i've wanted it for > years!) > > this proposal feels like it _ought_ to let a defender tighten their > seccomp filter to require a "safe" fd if i'm using mmap() with an fd, > but in practice as long as JITs exist i can always just copy code into > a non-fd-backed mmap() region. and -- from the perspective of Android, > where all "apps" are code loaded into a Java runtime -- there's not > much getting away from JITs. (and last i looked, ART -- Android's Java > runtime -- uses memfd() for the JIT cache.) Using the feature brought by this patch series makes sense for trusted executables willing to enforce a security policy. Untrusted ones should be sandboxed, which is the case for Android apps. > > > > > this means the kernel will enforce SECBIT_EXEC_RESTRICT_FILE = 1, the > > > > benefit is that there is no TOCTOU and save one round trip of syscall > > > > for a succesful execveat() case. > > > > > > As long as user space uses the same file descriptor, there is no TOCTOU. > > > > > > SECBIT_EXEC_RESTRICT_FILE only makes sense for user space: it defines > > > the user space security policy. The kernel already enforces the same > > > security policy for any execve(2), whatever are the calling process's > > > securebits. > > > > > > > > > > > For case 2> > > > > dynamic linker will call execve(AT_CHECK), then mmap(fd) into memory. > > > > However, the process can all open then mmap() directly, it seems > > > > minimal effort for an attacker to walk around such a defence from > > > > dynamic linker. > > > > > > Which process? What do you mean by "can all open then mmap() directly"? > > > > > > In this context the dynamic linker (like its parent processes) is > > > trusted (guaranteed by the system). > > > > > > For case 2, the dynamic linker must check with AT_CHECK all files that > > > will be mapped, which include /usr/bin/some.out and /tmp/a.so > > > > > My point is that the process can work around this by mmap() the file directly. > > > > > > > > > > Alternative solution: > > > > dynamic linker call AT_CHECK for each .so, kernel will save the state > > > > (associated with fd) > > > > kernel will check fd state at the time of mmap(fd, executable memory) > > > > and enforce SECBIT_EXEC_RESTRICT_FILE = 1 > > > > > > The idea with AT_CHECK is that there is no kernel side effect, no extra > > > kernel state, and the semantic is the same as with execve(2). > > > > > > This also enables us to check file's executable permission and ignore > > > it, which is useful in a "permissive mode" when preparing for a > > > migration without breaking a system, or to do extra integrity checks. > > For preparing a migration (detect all violations), this is useful. > > But as a defense mechanism (SECBIT_EXEC_RESTRICT_FILE = 1) , this > > seems to be weak, at least for elf loading case. > > > > > BTW, this use case would also be more complex with a new openat2(2) flag > > > like the original O_MAYEXEC. > > > > > > > > > > > Alternative solution 2: > > > > a new syscall to load the .so and enforce the AT_CHECK in kernel > > > > > > A new syscall would be overkill for this feature. Please see Linus's > > > comment. > > > > > maybe, I was thinking on how to prevent "/tmp/a.o" from getting mmap() > > to executable memory. > > > > > > > > > > This also means, for the solution to be complete, we might want to > > > > block creation of executable anonymous memory (e.g. by seccomp, ), > > > > > > How seccomp could create anonymous memory in user space? > > > seccomp filters should be treated (and checked with AT_CHECK) as > > > executable code anyway. > > > > > > > unless the user space can harden the creation of executable anonymous > > > > memory in some way. > > > > > > User space is already in charge of mmapping its own memory. I don't see > > > what is missing. > > > > > > > > > > > For case 3> > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make > > > > sure it passes AT_CHECK, before loading it into memory. > > > > > > All ELF dependencies are opened and checked with open_exec(), which > > > perform the main executability checks (with the __FMODE_EXEC flag). > > > Did I miss something? > > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel. > > The app can choose its own dynamic linker path during build, (maybe > > even statically link one ?) This is another reason that relying on a > > userspace only is not enough. > > > > > However, we must be careful with programs using the (deprecated) > > > uselib(2). They should also check with AT_CHECK because this syscall > > > opens the shared library without __FMODE_EXEC (similar to a simple file > > > open). See > > > https://lore.kernel.org/all/CAHk-=wiUwRG7LuR=z5sbkFVGQh+7qVB6_1NM0Ny9SVNL1Un4Sw@mail.gmail.com/ > > > > > > > > > > > For case 4> > > > > same as case 2. > > > > > > > > Consider those cases: I think: > > > > a> relying purely on userspace for enforcement does't seem to be > > > > effective, e.g. it is trivial to call open(), then mmap() it into > > > > executable memory. > > > > > > As Steve explained (and is also explained in the patches), it is trivial > > > if the attacker can already execute its own code, which is too late to > > > enforce any script execution control. > > > > > > > b> if both user space and kernel need to call AT_CHECK, the faccessat > > > > seems to be a better place for AT_CHECK, e.g. kernel can call > > > > do_faccessat(AT_CHECK) and userspace can call faccessat(). This will > > > > avoid complicating the execveat() code path. > > > > > > A previous version of this patches series already patched faccessat(2), > > > but this is not the right place. faccessat2(2) is dedicated to check > > > file permissions, not executability (e.g. with mount's noexec). > > > > > > > > > > > What do you think ? > > > > > > I think there are some misunderstandings. Please let me know if it's > > > clearer now. > > > > > I'm still not sure about the user case for dynamic linker (elf > > loading) case. Maybe this patch is more suitable for scripts? > > A detailed user case will help demonstrate the use case for dynamic > > linker, e.g. what kind of app will benefit from > > SECBIT_EXEC_RESTRICT_FILE = 1, what kind of threat model are we > > dealing with , what kind of attack chain we blocked as a result. > > > > > > > > > > Thanks > > > > -Jeff > > > > > > > > > With the information that a script interpreter is about to interpret a > > > > > script, an LSM security policy can adjust caller's access rights or log > > > > > execution request as for native script execution (e.g. role transition). > > > > > This is possible thanks to the call to security_bprm_creds_for_exec(). > > > > > > > > > > Because LSMs may only change bprm's credentials, use of AT_CHECK with > > > > > current kernel code should not be a security issue (e.g. unexpected role > > > > > transition). LSMs willing to update the caller's credential could now > > > > > do so when bprm->is_check is set. Of course, such policy change should > > > > > be in line with the new user space code. > > > > > > > > > > Because AT_CHECK is dedicated to user space interpreters, it doesn't > > > > > make sense for the kernel to parse the checked files, look for > > > > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC > > > > > if the format is unknown. Because of that, security_bprm_check() is > > > > > never called when AT_CHECK is used. > > > > > > > > > > It should be noted that script interpreters cannot directly use > > > > > execveat(2) (without this new AT_CHECK flag) because this could lead to > > > > > unexpected behaviors e.g., `python script.sh` could lead to Bash being > > > > > executed to interpret the script. Unlike the kernel, script > > > > > interpreters may just interpret the shebang as a simple comment, which > > > > > should not change for backward compatibility reasons. > > > > > > > > > > Because scripts or libraries files might not currently have the > > > > > executable permission set, or because we might want specific users to be > > > > > allowed to run arbitrary scripts, the following patch provides a dynamic > > > > > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and > > > > > SECBIT_SHOULD_EXEC_RESTRICT securebits. > > > > > > > > > > This is a redesign of the CLIP OS 4's O_MAYEXEC: > > > > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch > > > > > This patch has been used for more than a decade with customized script > > > > > interpreters. Some examples can be found here: > > > > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC > > > > > > > > > > Cc: Al Viro <viro@zeniv.linux.org.uk> > > > > > Cc: Christian Brauner <brauner@kernel.org> > > > > > Cc: Kees Cook <keescook@chromium.org> > > > > > Cc: Paul Moore <paul@paul-moore.com> > > > > > Link: https://docs.python.org/3/library/io.html#io.open_code [1] > > > > > Signed-off-by: Mickaël Salaün <mic@digikod.net> > > > > > Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net > > > > > --- > > > > > > > > > > New design since v18: > > > > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net > > > > > --- > > > > > fs/exec.c | 5 +++-- > > > > > include/linux/binfmts.h | 7 ++++++- > > > > > include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++ > > > > > kernel/audit.h | 1 + > > > > > kernel/auditsc.c | 1 + > > > > > 5 files changed, 41 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/fs/exec.c b/fs/exec.c > > > > > index 40073142288f..ea2a1867afdc 100644 > > > > > --- a/fs/exec.c > > > > > +++ b/fs/exec.c > > > > > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags) > > > > > .lookup_flags = LOOKUP_FOLLOW, > > > > > }; > > > > > > > > > > - if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0) > > > > > + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0) > > > > > return ERR_PTR(-EINVAL); > > > > > if (flags & AT_SYMLINK_NOFOLLOW) > > > > > open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW; > > > > > @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl > > > > > bprm->filename = bprm->fdpath; > > > > > } > > > > > bprm->interp = bprm->filename; > > > > > + bprm->is_check = !!(flags & AT_CHECK); > > > > > > > > > > retval = bprm_mm_init(bprm); > > > > > if (!retval) > > > > > @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm) > > > > > > > > > > /* Set the unchanging part of bprm->cred */ > > > > > retval = security_bprm_creds_for_exec(bprm); > > > > > - if (retval) > > > > > + if (retval || bprm->is_check) > > > > > goto out; > > > > > > > > > > retval = exec_binprm(bprm); > > > > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > > > > > index 70f97f685bff..8ff9c9e33aed 100644 > > > > > --- a/include/linux/binfmts.h > > > > > +++ b/include/linux/binfmts.h > > > > > @@ -42,7 +42,12 @@ struct linux_binprm { > > > > > * Set when errors can no longer be returned to the > > > > > * original userspace. > > > > > */ > > > > > - point_of_no_return:1; > > > > > + point_of_no_return:1, > > > > > + /* > > > > > + * Set by user space to check executability according to the > > > > > + * caller's environment. > > > > > + */ > > > > > + is_check:1; > > > > > struct file *executable; /* Executable to pass to the interpreter */ > > > > > struct file *interpreter; > > > > > struct file *file; > > > > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h > > > > > index c0bcc185fa48..bcd05c59b7df 100644 > > > > > --- a/include/uapi/linux/fcntl.h > > > > > +++ b/include/uapi/linux/fcntl.h > > > > > @@ -118,6 +118,36 @@ > > > > > #define AT_HANDLE_FID AT_REMOVEDIR /* file handle is needed to > > > > > compare object identity and may not > > > > > be usable to open_by_handle_at(2) */ > > > > > + > > > > > +/* > > > > > + * AT_CHECK only performs a check on a regular file and returns 0 if execution > > > > > + * of this file would be allowed, ignoring the file format and then the related > > > > > + * interpreter dependencies (e.g. ELF libraries, script's shebang). AT_CHECK > > > > > + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling > > > > > + * thread. See securebits.h documentation. > > > > > + * > > > > > + * Programs should use this check to apply kernel-level checks against files > > > > > + * that are not directly executed by the kernel but directly passed to a user > > > > > + * space interpreter instead. All files that contain executable code, from the > > > > > + * point of view of the interpreter, should be checked. The main purpose of > > > > > + * this flag is to improve the security and consistency of an execution > > > > > + * environment to ensure that direct file execution (e.g. ./script.sh) and > > > > > + * indirect file execution (e.g. sh script.sh) lead to the same result. For > > > > > + * instance, this can be used to check if a file is trustworthy according to > > > > > + * the caller's environment. > > > > > + * > > > > > + * In a secure environment, libraries and any executable dependencies should > > > > > + * also be checked. For instance dynamic linking should make sure that all > > > > > + * libraries are allowed for execution to avoid trivial bypass (e.g. using > > > > > + * LD_PRELOAD). For such secure execution environment to make sense, only > > > > > + * trusted code should be executable, which also requires integrity guarantees. > > > > > + * > > > > > + * To avoid race conditions leading to time-of-check to time-of-use issues, > > > > > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file > > > > > + * descriptor instead of a path. > > > > > + */ > > > > > +#define AT_CHECK 0x10000 > > > > > + > > > > > #if defined(__KERNEL__) > > > > > #define AT_GETATTR_NOSEC 0x80000000 > > > > > #endif > > > > > diff --git a/kernel/audit.h b/kernel/audit.h > > > > > index a60d2840559e..8ebdabd2ab81 100644 > > > > > --- a/kernel/audit.h > > > > > +++ b/kernel/audit.h > > > > > @@ -197,6 +197,7 @@ struct audit_context { > > > > > struct open_how openat2; > > > > > struct { > > > > > int argc; > > > > > + bool is_check; > > > > > } execve; > > > > > struct { > > > > > char *name; > > > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > > > > > index 6f0d6fb6523f..b6316e284342 100644 > > > > > --- a/kernel/auditsc.c > > > > > +++ b/kernel/auditsc.c > > > > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm) > > > > > > > > > > context->type = AUDIT_EXECVE; > > > > > context->execve.argc = bprm->argc; > > > > > + context->execve.is_check = bprm->is_check; > > > > > } > > > > > > > > > > > > > > > -- > > > > > 2.45.2 > > > > > > > > > >
On Thu, Jul 18, 2024 at 5:23 AM Mickaël Salaün <mic@digikod.net> wrote: > > On Wed, Jul 17, 2024 at 06:51:11PM -0700, Jeff Xu wrote: > > On Wed, Jul 17, 2024 at 3:00 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote: > > > > On 17/07/2024 07:33, Jeff Xu wrote: > > > > > Consider those cases: I think: > > > > > a> relying purely on userspace for enforcement does't seem to be > > > > > effective, e.g. it is trivial to call open(), then mmap() it into > > > > > executable memory. > > > > > > > > If there's a way to do this without running executable code that had to pass > > > > a previous execveat() check, then yeah, it's not effective (e.g. a Python > > > > interpreter that *doesn't* enforce execveat() is a trivial way to do it). > > > > > > > > Once arbitrary code is running, all bets are off. So long as all arbitrary > > > > code is being checked itself, it's allowed to do things that would bypass > > > > later checks (and it's up to whoever audited it in the first place to > > > > prevent this by not giving it the special mark that allows it to pass the > > > > check). > > > > > We will want to define what is considered as "arbitrary code is running" > > > > Using an example of ROP, attackers change the return address in stack, > > e.g. direct the execution flow to a gauge to call "ld.so /tmp/a.out", > > do you consider "arbitrary code is running" when stack is overwritten > > ? or after execve() is called. > > Yes, ROP is arbitrary code execution (which can be mitigated with CFI). > ROP could be enough to interpret custom commands and create a small > interpreter/VM. > > > If it is later, this patch can prevent "ld.so /tmp/a.out". > > > > > Exactly. As explained in the patches, one crucial prerequisite is that > > > the executable code is trusted, and the system must provide integrity > > > guarantees. We cannot do anything without that. This patches series is > > > a building block to fix a blind spot on Linux systems to be able to > > > fully control executability. > > > > Even trusted executable can have a bug. > > Definitely, but this patch series is dedicated to script execution > control. > > > > > I'm thinking in the context of ChromeOS, where all its system services > > are from trusted partitions, and legit code won't load .so from a > > non-exec mount. But we want to sandbox those services, so even under > > some kind of ROP attack, the service still won't be able to load .so > > from /tmp. Of course, if an attacker can already write arbitrary > > length of data into the stack, it is probably already a game over. > > > > OK, you want to tie executable file permission to mmap. That makes > sense if you have a consistent execution model. This can be enforced by > LSMs. Contrary to script interpretation which is a full user space > implementation (and then controlled by user space), mmap restrictions > should indeed be enforced by the kernel. Ya, that is what I meant. it can be out of scope for this patch. Indeed, as you point out, this patch is dedicated to script execution control, and fixing ld.so /tmp/a.out is an extra bonus in addition to script. Thanks -Jeff
On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote: > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote: > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote: > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > > > > allowed for execution. The main use case is for script interpreters and > > > > > dynamic linkers to check execution permission according to the kernel's > > > > > security policy. Another use case is to add context to access logs e.g., > > > > > which script (instead of interpreter) accessed a file. As any > > > > > executable code, scripts could also use this check [1]. > > > > > > > > > > This is different than faccessat(2) which only checks file access > > > > > rights, but not the full context e.g. mount point's noexec, stack limit, > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials). > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a > > > > > real execution, user space gets the same error codes. > > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the > > > > exec, shared object, script and config file (such as seccomp config), > > > > > > "config file" that contains executable code. > > > > > Is seccomp config considered as "contains executable code", seccomp > > config is translated into bpf, so maybe yes ? but bpf is running in > > the kernel. > > Because seccomp filters alter syscalls, they are similar to code > injection. > that makes sense. > > > > > > I'm still thinking execveat(AT_CHECK) vs faccessat(AT_CHECK) in > > > > different use cases: > > > > > > > > execveat clearly has less code change, but that also means: we can't > > > > add logic specific to exec (i.e. logic that can't be applied to > > > > config) for this part (from do_execveat_common to > > > > security_bprm_creds_for_exec) in future. This would require some > > > > agreement/sign-off, I'm not sure from whom. > > > > > > I'm not sure to follow. We could still add new flags, but for now I > > > don't see use cases. This patch series is not meant to handle all > > > possible "trust checks", only executable code, which makes sense for the > > > kernel. > > > > > I guess the "configfile" discussion is where I get confused, at one > > point, I think this would become a generic "trust checks" api for > > everything related to "generating executable code", e.g. javascript, > > java code, and more. > > We will want to clearly define the scope of execveat(AT_CHECK) > > The line between data and code is blurry. For instance, a configuration > file can impact the execution flow of a program. So, where to draw the > line? > > It might makes sense to follow the kernel and interpreter semantic: if a > file can be executed by the kernel (e.g. ELF binary, file containing a > shebang, or just configured with binfmt_misc), then this should be > considered as executable code. This applies to Bash, Python, > Javascript, NodeJS, PE, PHP... However, we can also make a picture > executable with binfmt_misc. So, again, where to draw the line? > > I'd recommend to think about interaction with the outside, through > function calls, IPCs, syscalls... For instance, "running" an image > should not lead to reading or writing to arbitrary files, or accessing > the network, but in practice it is legitimate for some file formats... > PostScript is a programming language, but mostly used to draw pictures. > So, again, where to draw the line? > > We should follow the principle of least astonishment. What most users > would expect? This should follow the *common usage* of executable > files. At the end, the script interpreters will be patched by security > folks for security reasons. I think the right question to ask should > be: could this file format be (ab)used to leak or modify arbitrary > files, or to perform arbitrary syscalls? If the answer is yes, then it > should be checked for executability. Of course, this excludes bugs > exploited in the file format parser. > > I'll extend the next patch series with this rationale. > > > > > > If we want other checks, we'll need to clearly define their semantic and > > > align with the kernel. faccessat2(2) might be used to check other file > > > properties, but the executable property is not only defined by the file > > > attributes. > > > > > Agreed. > > > > > > > > > > -------------------------- > > > > now looked at user cases (focus on elf for now) > > > > > > > > 1> ld.so /tmp/a.out, /tmp/a.out is on non-exec mount > > > > dynamic linker will first call execveat(fd, AT_CHECK) then execveat(fd) > > > > > > > > 2> execve(/usr/bin/some.out) and some.out has dependency on /tmp/a.so > > > > /usr/bin/some.out will pass AT_CHECK > > > > > > > > 3> execve(usr/bin/some.out) and some.out uses custom /tmp/ld.so > > > > /usr/bin/some.out will pass AT_CHECK, however, it uses a custom > > > > /tmp/ld.so (I assume this is possible for elf header will set the > > > > path for ld.so because kernel has no knowledge of that, and > > > > binfmt_elf.c allocate memory for ld.so during execveat call) > > > > > > > > 4> dlopen(/tmp/a.so) > > > > I assume dynamic linker will call execveat(AT_CHECK), before map a.so > > > > into memory. > > > > > > > > For case 1> > > > > Alternative solution: Because AT_CHECK is always called, I think we > > > > can avoid the first AT_CHECK call, and check during execveat(fd), > > > > > > There is no need to use AT_CHECK if we're going to call execveat(2) on > > > the same file descriptor. By design, AT_CHECK is implicit for any > > > execve(2). > > > > > Yes. I realized I was wrong to say that ld.so will call execve() for > > /tmp/a.out, there is no execve() call, otherwise it would have been > > blocked already today. > > The ld.so will mmap the /tmp/a.out directly. So case 1 is no > > different than case 2 and 4. ( the elf objects are mapped to memory > > by dynamic linker.) > > I'm not familiar with dynamic linker, Florian is on this thread, and > > can help to correct me if my guess is wrong. > > > > > > this means the kernel will enforce SECBIT_EXEC_RESTRICT_FILE = 1, the > > > > benefit is that there is no TOCTOU and save one round trip of syscall > > > > for a succesful execveat() case. > > > > > > As long as user space uses the same file descriptor, there is no TOCTOU. > > > > > > SECBIT_EXEC_RESTRICT_FILE only makes sense for user space: it defines > > > the user space security policy. The kernel already enforces the same > > > security policy for any execve(2), whatever are the calling process's > > > securebits. > > > > > > > > > > > For case 2> > > > > dynamic linker will call execve(AT_CHECK), then mmap(fd) into memory. > > > > However, the process can all open then mmap() directly, it seems > > > > minimal effort for an attacker to walk around such a defence from > > > > dynamic linker. > > > > > > Which process? What do you mean by "can all open then mmap() directly"? > > > > > > In this context the dynamic linker (like its parent processes) is > > > trusted (guaranteed by the system). > > > > > > For case 2, the dynamic linker must check with AT_CHECK all files that > > > will be mapped, which include /usr/bin/some.out and /tmp/a.so > > > > > My point is that the process can work around this by mmap() the file directly. > > Yes, see my answer in the other email. The process is trusted. > OK. Let's agree that this is out of scope for this patch series. > > > > > > > > > > Alternative solution: > > > > dynamic linker call AT_CHECK for each .so, kernel will save the state > > > > (associated with fd) > > > > kernel will check fd state at the time of mmap(fd, executable memory) > > > > and enforce SECBIT_EXEC_RESTRICT_FILE = 1 > > > > > > The idea with AT_CHECK is that there is no kernel side effect, no extra > > > kernel state, and the semantic is the same as with execve(2). > > > > > > This also enables us to check file's executable permission and ignore > > > it, which is useful in a "permissive mode" when preparing for a > > > migration without breaking a system, or to do extra integrity checks. > > For preparing a migration (detect all violations), this is useful. > > But as a defense mechanism (SECBIT_EXEC_RESTRICT_FILE = 1) , this > > seems to be weak, at least for elf loading case. > > We could add more restrictions, but that is outside the scope of this > patch series. > Agreed. > > > > > BTW, this use case would also be more complex with a new openat2(2) flag > > > like the original O_MAYEXEC. > > > > > > > > > > > Alternative solution 2: > > > > a new syscall to load the .so and enforce the AT_CHECK in kernel > > > > > > A new syscall would be overkill for this feature. Please see Linus's > > > comment. > > > > > maybe, I was thinking on how to prevent "/tmp/a.o" from getting mmap() > > to executable memory. > > OK, this is another story. > > > > > > > > > > > This also means, for the solution to be complete, we might want to > > > > block creation of executable anonymous memory (e.g. by seccomp, ), > > > > > > How seccomp could create anonymous memory in user space? > > > seccomp filters should be treated (and checked with AT_CHECK) as > > > executable code anyway. > > > > > > > unless the user space can harden the creation of executable anonymous > > > > memory in some way. > > > > > > User space is already in charge of mmapping its own memory. I don't see > > > what is missing. > > > > > > > > > > > For case 3> > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make > > > > sure it passes AT_CHECK, before loading it into memory. > > > > > > All ELF dependencies are opened and checked with open_exec(), which > > > perform the main executability checks (with the __FMODE_EXEC flag). > > > Did I miss something? > > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel. > > The app can choose its own dynamic linker path during build, (maybe > > even statically link one ?) This is another reason that relying on a > > userspace only is not enough. > > The kernel calls open_exec() on all dependencies, including > ld-linux-x86-64.so.2, so these files are checked for executability too. > This might not be entirely true. iiuc, kernel calls open_exec for open_exec for interpreter, but not all its dependency (e.g. libc.so.6) load_elf_binary() { interpreter = open_exec(elf_interpreter); } libc.so.6 is opened and mapped by dynamic linker. so the call sequence is: execve(a.out) - open exec(a.out) - security_bprm_creds(a.out) - open the exec(ld.so) - call open_exec() for interruptor (ld.so) - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through the same check and code path as libc.so below ? - transfer the control to ld.so) - ld.so open (libc.so) - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch, require dynamic linker change. - ld.so mmap(libc.so,rx) > > > > > However, we must be careful with programs using the (deprecated) > > > uselib(2). They should also check with AT_CHECK because this syscall > > > opens the shared library without __FMODE_EXEC (similar to a simple file > > > open). See > > > https://lore.kernel.org/all/CAHk-=wiUwRG7LuR=z5sbkFVGQh+7qVB6_1NM0Ny9SVNL1Un4Sw@mail.gmail.com/ > > > > > > > > > > > For case 4> > > > > same as case 2. > > > > > > > > Consider those cases: I think: > > > > a> relying purely on userspace for enforcement does't seem to be > > > > effective, e.g. it is trivial to call open(), then mmap() it into > > > > executable memory. > > > > > > As Steve explained (and is also explained in the patches), it is trivial > > > if the attacker can already execute its own code, which is too late to > > > enforce any script execution control. > > > > > > > b> if both user space and kernel need to call AT_CHECK, the faccessat > > > > seems to be a better place for AT_CHECK, e.g. kernel can call > > > > do_faccessat(AT_CHECK) and userspace can call faccessat(). This will > > > > avoid complicating the execveat() code path. > > > > > > A previous version of this patches series already patched faccessat(2), > > > but this is not the right place. faccessat2(2) is dedicated to check > > > file permissions, not executability (e.g. with mount's noexec). > > > > > > > > > > > What do you think ? > > > > > > I think there are some misunderstandings. Please let me know if it's > > > clearer now. > > > > > I'm still not sure about the user case for dynamic linker (elf > > loading) case. Maybe this patch is more suitable for scripts? > > It's suitable for both, but we could add more restriction on mmap > with an (existing) LSM. The kernel already checks for mount's noexec > when mapping a file, but not for the file permission, which is OK > because it could be bypassed by coping the content of the file and > mprotecting it anyway. For a consistent memory execution control, all > memory mapping need to be restricted, which is out of scope for this > patch series. > Ok. > > A detailed user case will help demonstrate the use case for dynamic > > linker, e.g. what kind of app will benefit from > > SECBIT_EXEC_RESTRICT_FILE = 1, what kind of threat model are we > > dealing with , what kind of attack chain we blocked as a result. > > I explained that in the patches and in the description of these new > securebits. Please point which part is not clear. The full threat > model is simple: the TCB includes the kernel and system's files, which > are integrity-protected, but we don't trust arbitrary data/scripts that > can be written to user-owned files or directly provided to script > interpreters. As for the ptrace restrictions, the dynamic linker > restrictions helps to avoid trivial bypasses (e.g. with LD_PRELOAD) > with consistent executability checks. > On elf loading case, I'm clear after your last email. However, I'm not sure if everyone else follows, I will try to summarize here: - Problem: ld.so /tmp/a.out will happily pass, even /tmp/a.out is mounted as non-exec. Solution: ld.so call execveat(AT_CHECK) for a.out before mmap a.out into memory. - Problem: a poorly built application (a.out) can have a dependency on /tmp/a.o, when /tmp/a.o is on non-exec mount, Solution: ld.so call execveat(AT_CHECK) for a.o, before mmap a.o into memory. - Problem: application can call mmap (/tmp/a.out, rx), where /tmp is on non-exec mount This is out of scope, i.e. will require enforcement on mmap(), maybe through LSM Thanks Best regards -Jeff -Jeff > > > > > > > > > > Thanks > > > > -Jeff > > > > > > > > > With the information that a script interpreter is about to interpret a > > > > > script, an LSM security policy can adjust caller's access rights or log > > > > > execution request as for native script execution (e.g. role transition). > > > > > This is possible thanks to the call to security_bprm_creds_for_exec(). > > > > > > > > > > Because LSMs may only change bprm's credentials, use of AT_CHECK with > > > > > current kernel code should not be a security issue (e.g. unexpected role > > > > > transition). LSMs willing to update the caller's credential could now > > > > > do so when bprm->is_check is set. Of course, such policy change should > > > > > be in line with the new user space code. > > > > > > > > > > Because AT_CHECK is dedicated to user space interpreters, it doesn't > > > > > make sense for the kernel to parse the checked files, look for > > > > > interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC > > > > > if the format is unknown. Because of that, security_bprm_check() is > > > > > never called when AT_CHECK is used. > > > > > > > > > > It should be noted that script interpreters cannot directly use > > > > > execveat(2) (without this new AT_CHECK flag) because this could lead to > > > > > unexpected behaviors e.g., `python script.sh` could lead to Bash being > > > > > executed to interpret the script. Unlike the kernel, script > > > > > interpreters may just interpret the shebang as a simple comment, which > > > > > should not change for backward compatibility reasons. > > > > > > > > > > Because scripts or libraries files might not currently have the > > > > > executable permission set, or because we might want specific users to be > > > > > allowed to run arbitrary scripts, the following patch provides a dynamic > > > > > configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and > > > > > SECBIT_SHOULD_EXEC_RESTRICT securebits. > > > > > > > > > > This is a redesign of the CLIP OS 4's O_MAYEXEC: > > > > > https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch > > > > > This patch has been used for more than a decade with customized script > > > > > interpreters. Some examples can be found here: > > > > > https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC > > > > > > > > > > Cc: Al Viro <viro@zeniv.linux.org.uk> > > > > > Cc: Christian Brauner <brauner@kernel.org> > > > > > Cc: Kees Cook <keescook@chromium.org> > > > > > Cc: Paul Moore <paul@paul-moore.com> > > > > > Link: https://docs.python.org/3/library/io.html#io.open_code [1] > > > > > Signed-off-by: Mickaël Salaün <mic@digikod.net> > > > > > Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net > > > > > --- > > > > > > > > > > New design since v18: > > > > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net > > > > > --- > > > > > fs/exec.c | 5 +++-- > > > > > include/linux/binfmts.h | 7 ++++++- > > > > > include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++ > > > > > kernel/audit.h | 1 + > > > > > kernel/auditsc.c | 1 + > > > > > 5 files changed, 41 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/fs/exec.c b/fs/exec.c > > > > > index 40073142288f..ea2a1867afdc 100644 > > > > > --- a/fs/exec.c > > > > > +++ b/fs/exec.c > > > > > @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags) > > > > > .lookup_flags = LOOKUP_FOLLOW, > > > > > }; > > > > > > > > > > - if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0) > > > > > + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0) > > > > > return ERR_PTR(-EINVAL); > > > > > if (flags & AT_SYMLINK_NOFOLLOW) > > > > > open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW; > > > > > @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl > > > > > bprm->filename = bprm->fdpath; > > > > > } > > > > > bprm->interp = bprm->filename; > > > > > + bprm->is_check = !!(flags & AT_CHECK); > > > > > > > > > > retval = bprm_mm_init(bprm); > > > > > if (!retval) > > > > > @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm) > > > > > > > > > > /* Set the unchanging part of bprm->cred */ > > > > > retval = security_bprm_creds_for_exec(bprm); > > > > > - if (retval) > > > > > + if (retval || bprm->is_check) > > > > > goto out; > > > > > > > > > > retval = exec_binprm(bprm); > > > > > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > > > > > index 70f97f685bff..8ff9c9e33aed 100644 > > > > > --- a/include/linux/binfmts.h > > > > > +++ b/include/linux/binfmts.h > > > > > @@ -42,7 +42,12 @@ struct linux_binprm { > > > > > * Set when errors can no longer be returned to the > > > > > * original userspace. > > > > > */ > > > > > - point_of_no_return:1; > > > > > + point_of_no_return:1, > > > > > + /* > > > > > + * Set by user space to check executability according to the > > > > > + * caller's environment. > > > > > + */ > > > > > + is_check:1; > > > > > struct file *executable; /* Executable to pass to the interpreter */ > > > > > struct file *interpreter; > > > > > struct file *file; > > > > > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h > > > > > index c0bcc185fa48..bcd05c59b7df 100644 > > > > > --- a/include/uapi/linux/fcntl.h > > > > > +++ b/include/uapi/linux/fcntl.h > > > > > @@ -118,6 +118,36 @@ > > > > > #define AT_HANDLE_FID AT_REMOVEDIR /* file handle is needed to > > > > > compare object identity and may not > > > > > be usable to open_by_handle_at(2) */ > > > > > + > > > > > +/* > > > > > + * AT_CHECK only performs a check on a regular file and returns 0 if execution > > > > > + * of this file would be allowed, ignoring the file format and then the related > > > > > + * interpreter dependencies (e.g. ELF libraries, script's shebang). AT_CHECK > > > > > + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling > > > > > + * thread. See securebits.h documentation. > > > > > + * > > > > > + * Programs should use this check to apply kernel-level checks against files > > > > > + * that are not directly executed by the kernel but directly passed to a user > > > > > + * space interpreter instead. All files that contain executable code, from the > > > > > + * point of view of the interpreter, should be checked. The main purpose of > > > > > + * this flag is to improve the security and consistency of an execution > > > > > + * environment to ensure that direct file execution (e.g. ./script.sh) and > > > > > + * indirect file execution (e.g. sh script.sh) lead to the same result. For > > > > > + * instance, this can be used to check if a file is trustworthy according to > > > > > + * the caller's environment. > > > > > + * > > > > > + * In a secure environment, libraries and any executable dependencies should > > > > > + * also be checked. For instance dynamic linking should make sure that all > > > > > + * libraries are allowed for execution to avoid trivial bypass (e.g. using > > > > > + * LD_PRELOAD). For such secure execution environment to make sense, only > > > > > + * trusted code should be executable, which also requires integrity guarantees. > > > > > + * > > > > > + * To avoid race conditions leading to time-of-check to time-of-use issues, > > > > > + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file > > > > > + * descriptor instead of a path. > > > > > + */ > > > > > +#define AT_CHECK 0x10000 > > > > > + > > > > > #if defined(__KERNEL__) > > > > > #define AT_GETATTR_NOSEC 0x80000000 > > > > > #endif > > > > > diff --git a/kernel/audit.h b/kernel/audit.h > > > > > index a60d2840559e..8ebdabd2ab81 100644 > > > > > --- a/kernel/audit.h > > > > > +++ b/kernel/audit.h > > > > > @@ -197,6 +197,7 @@ struct audit_context { > > > > > struct open_how openat2; > > > > > struct { > > > > > int argc; > > > > > + bool is_check; > > > > > } execve; > > > > > struct { > > > > > char *name; > > > > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c > > > > > index 6f0d6fb6523f..b6316e284342 100644 > > > > > --- a/kernel/auditsc.c > > > > > +++ b/kernel/auditsc.c > > > > > @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm) > > > > > > > > > > context->type = AUDIT_EXECVE; > > > > > context->execve.argc = bprm->argc; > > > > > + context->execve.is_check = bprm->is_check; > > > > > } > > > > > > > > > > > > > > > -- > > > > > 2.45.2 > > > > > > > > > > >
On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote: > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote: > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote: > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > > > > > allowed for execution. The main use case is for script interpreters and > > > > > > dynamic linkers to check execution permission according to the kernel's > > > > > > security policy. Another use case is to add context to access logs e.g., > > > > > > which script (instead of interpreter) accessed a file. As any > > > > > > executable code, scripts could also use this check [1]. > > > > > > > > > > > > This is different than faccessat(2) which only checks file access > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit, > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials). > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a > > > > > > real execution, user space gets the same error codes. > > > > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the > > > > > exec, shared object, script and config file (such as seccomp config), > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make > > > > > sure it passes AT_CHECK, before loading it into memory. > > > > > > > > All ELF dependencies are opened and checked with open_exec(), which > > > > perform the main executability checks (with the __FMODE_EXEC flag). > > > > Did I miss something? > > > > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel. > > > The app can choose its own dynamic linker path during build, (maybe > > > even statically link one ?) This is another reason that relying on a > > > userspace only is not enough. > > > > The kernel calls open_exec() on all dependencies, including > > ld-linux-x86-64.so.2, so these files are checked for executability too. > > > This might not be entirely true. iiuc, kernel calls open_exec for > open_exec for interpreter, but not all its dependency (e.g. libc.so.6) Correct, the dynamic linker is in charge of that, which is why it must be enlighten with execveat+AT_CHECK and securebits checks. > load_elf_binary() { > interpreter = open_exec(elf_interpreter); > } > > libc.so.6 is opened and mapped by dynamic linker. > so the call sequence is: > execve(a.out) > - open exec(a.out) > - security_bprm_creds(a.out) > - open the exec(ld.so) > - call open_exec() for interruptor (ld.so) > - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through > the same check and code path as libc.so below ? open_exec() checks are enough. LSMs can use this information (open + __FMODE_EXEC) if needed. execveat+AT_CHECK is only a user space request. > - transfer the control to ld.so) > - ld.so open (libc.so) > - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch, > require dynamic linker change. > - ld.so mmap(libc.so,rx) Explaining these steps is useful. I'll include that in the next patch series. > > > A detailed user case will help demonstrate the use case for dynamic > > > linker, e.g. what kind of app will benefit from > > > SECBIT_EXEC_RESTRICT_FILE = 1, what kind of threat model are we > > > dealing with , what kind of attack chain we blocked as a result. > > > > I explained that in the patches and in the description of these new > > securebits. Please point which part is not clear. The full threat > > model is simple: the TCB includes the kernel and system's files, which > > are integrity-protected, but we don't trust arbitrary data/scripts that > > can be written to user-owned files or directly provided to script > > interpreters. As for the ptrace restrictions, the dynamic linker > > restrictions helps to avoid trivial bypasses (e.g. with LD_PRELOAD) > > with consistent executability checks. > > > On elf loading case, I'm clear after your last email. However, I'm not > sure if everyone else follows, I will try to summarize here: > - Problem: ld.so /tmp/a.out will happily pass, even /tmp/a.out is > mounted as non-exec. > Solution: ld.so call execveat(AT_CHECK) for a.out before mmap a.out > into memory. > > - Problem: a poorly built application (a.out) can have a dependency on > /tmp/a.o, when /tmp/a.o is on non-exec mount, > Solution: ld.so call execveat(AT_CHECK) for a.o, before mmap a.o into memory. > > - Problem: application can call mmap (/tmp/a.out, rx), where /tmp is > on non-exec mount I'd say "malicious or non-enlightened processes" can call mmap without execveat+AT_CHECK... > This is out of scope, i.e. will require enforcement on mmap(), maybe > through LSM Cool, I'll include that as well. Thanks.
On Fri, Jul 19, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote: > > On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote: > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote: > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote: > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > > > > > > allowed for execution. The main use case is for script interpreters and > > > > > > > dynamic linkers to check execution permission according to the kernel's > > > > > > > security policy. Another use case is to add context to access logs e.g., > > > > > > > which script (instead of interpreter) accessed a file. As any > > > > > > > executable code, scripts could also use this check [1]. > > > > > > > > > > > > > > This is different than faccessat(2) which only checks file access > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit, > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials). > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a > > > > > > > real execution, user space gets the same error codes. > > > > > > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the > > > > > > exec, shared object, script and config file (such as seccomp config), > > > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make > > > > > > sure it passes AT_CHECK, before loading it into memory. > > > > > > > > > > All ELF dependencies are opened and checked with open_exec(), which > > > > > perform the main executability checks (with the __FMODE_EXEC flag). > > > > > Did I miss something? > > > > > > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel. > > > > The app can choose its own dynamic linker path during build, (maybe > > > > even statically link one ?) This is another reason that relying on a > > > > userspace only is not enough. > > > > > > The kernel calls open_exec() on all dependencies, including > > > ld-linux-x86-64.so.2, so these files are checked for executability too. > > > > > This might not be entirely true. iiuc, kernel calls open_exec for > > open_exec for interpreter, but not all its dependency (e.g. libc.so.6) > > Correct, the dynamic linker is in charge of that, which is why it must > be enlighten with execveat+AT_CHECK and securebits checks. > > > load_elf_binary() { > > interpreter = open_exec(elf_interpreter); > > } > > > > libc.so.6 is opened and mapped by dynamic linker. > > so the call sequence is: > > execve(a.out) > > - open exec(a.out) > > - security_bprm_creds(a.out) > > - open the exec(ld.so) > > - call open_exec() for interruptor (ld.so) > > - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through > > the same check and code path as libc.so below ? > > open_exec() checks are enough. LSMs can use this information (open + > __FMODE_EXEC) if needed. execveat+AT_CHECK is only a user space > request. > Then the ld.so doesn't go through the same security_bprm_creds() check as other .so. As my previous email, the ChromeOS LSM restricts executable mfd through security_bprm_creds(), the end result is that ld.so can still be executable memfd, but not other .so. One way to address this is to refactor the necessary code from execveat() code patch, and make it available to call from both kernel and execveat() code paths., but if we do that, we might as well use faccessat2(AT_CHECK) > > - transfer the control to ld.so) > > - ld.so open (libc.so) > > - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch, > > require dynamic linker change. > > - ld.so mmap(libc.so,rx) > > Explaining these steps is useful. I'll include that in the next patch > series. > > > > > A detailed user case will help demonstrate the use case for dynamic > > > > linker, e.g. what kind of app will benefit from > > > > SECBIT_EXEC_RESTRICT_FILE = 1, what kind of threat model are we > > > > dealing with , what kind of attack chain we blocked as a result. > > > > > > I explained that in the patches and in the description of these new > > > securebits. Please point which part is not clear. The full threat > > > model is simple: the TCB includes the kernel and system's files, which > > > are integrity-protected, but we don't trust arbitrary data/scripts that > > > can be written to user-owned files or directly provided to script > > > interpreters. As for the ptrace restrictions, the dynamic linker > > > restrictions helps to avoid trivial bypasses (e.g. with LD_PRELOAD) > > > with consistent executability checks. > > > > > On elf loading case, I'm clear after your last email. However, I'm not > > sure if everyone else follows, I will try to summarize here: > > - Problem: ld.so /tmp/a.out will happily pass, even /tmp/a.out is > > mounted as non-exec. > > Solution: ld.so call execveat(AT_CHECK) for a.out before mmap a.out > > into memory. > > > > - Problem: a poorly built application (a.out) can have a dependency on > > /tmp/a.o, when /tmp/a.o is on non-exec mount, > > Solution: ld.so call execveat(AT_CHECK) for a.o, before mmap a.o into memory. > > > > - Problem: application can call mmap (/tmp/a.out, rx), where /tmp is > > on non-exec mount > > I'd say "malicious or non-enlightened processes" can call mmap without > execveat+AT_CHECK... > > > This is out of scope, i.e. will require enforcement on mmap(), maybe > > through LSM > > Cool, I'll include that as well. Thanks.
On Fri, Jul 19, 2024 at 07:16:55AM -0700, Jeff Xu wrote: > On Fri, Jul 19, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote: > > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote: > > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote: > > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > > > > > > > allowed for execution. The main use case is for script interpreters and > > > > > > > > dynamic linkers to check execution permission according to the kernel's > > > > > > > > security policy. Another use case is to add context to access logs e.g., > > > > > > > > which script (instead of interpreter) accessed a file. As any > > > > > > > > executable code, scripts could also use this check [1]. > > > > > > > > > > > > > > > > This is different than faccessat(2) which only checks file access > > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit, > > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials). > > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a > > > > > > > > real execution, user space gets the same error codes. > > > > > > > > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the > > > > > > > exec, shared object, script and config file (such as seccomp config), > > > > > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make > > > > > > > sure it passes AT_CHECK, before loading it into memory. > > > > > > > > > > > > All ELF dependencies are opened and checked with open_exec(), which > > > > > > perform the main executability checks (with the __FMODE_EXEC flag). > > > > > > Did I miss something? > > > > > > > > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel. > > > > > The app can choose its own dynamic linker path during build, (maybe > > > > > even statically link one ?) This is another reason that relying on a > > > > > userspace only is not enough. > > > > > > > > The kernel calls open_exec() on all dependencies, including > > > > ld-linux-x86-64.so.2, so these files are checked for executability too. > > > > > > > This might not be entirely true. iiuc, kernel calls open_exec for > > > open_exec for interpreter, but not all its dependency (e.g. libc.so.6) > > > > Correct, the dynamic linker is in charge of that, which is why it must > > be enlighten with execveat+AT_CHECK and securebits checks. > > > > > load_elf_binary() { > > > interpreter = open_exec(elf_interpreter); > > > } > > > > > > libc.so.6 is opened and mapped by dynamic linker. > > > so the call sequence is: > > > execve(a.out) > > > - open exec(a.out) > > > - security_bprm_creds(a.out) > > > - open the exec(ld.so) > > > - call open_exec() for interruptor (ld.so) > > > - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through > > > the same check and code path as libc.so below ? > > > > open_exec() checks are enough. LSMs can use this information (open + > > __FMODE_EXEC) if needed. execveat+AT_CHECK is only a user space > > request. > > > Then the ld.so doesn't go through the same security_bprm_creds() check > as other .so. Indeed, but... > > As my previous email, the ChromeOS LSM restricts executable mfd > through security_bprm_creds(), the end result is that ld.so can still > be executable memfd, but not other .so. The chromeOS LSM can check that with the security_file_open() hook and the __FMODE_EXEC flag, see Landlock's implementation. I think this should be the only hook implementation that chromeOS LSM needs to add. > > One way to address this is to refactor the necessary code from > execveat() code patch, and make it available to call from both kernel > and execveat() code paths., but if we do that, we might as well use > faccessat2(AT_CHECK) That's why I think it makes sense to rely on the existing __FMODE_EXEC information. > > > > > - transfer the control to ld.so) > > > - ld.so open (libc.so) > > > - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch, > > > require dynamic linker change. > > > - ld.so mmap(libc.so,rx) > > > > Explaining these steps is useful. I'll include that in the next patch > > series.
On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote: > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote: > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote: > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > > > > allowed for execution. The main use case is for script interpreters and > > > > > dynamic linkers to check execution permission according to the kernel's > > > > > security policy. Another use case is to add context to access logs e.g., > > > > > which script (instead of interpreter) accessed a file. As any > > > > > executable code, scripts could also use this check [1]. > > > > > > > > > > This is different than faccessat(2) which only checks file access > > > > > rights, but not the full context e.g. mount point's noexec, stack limit, > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials). > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a > > > > > real execution, user space gets the same error codes. > > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the > > > > exec, shared object, script and config file (such as seccomp config), > > > > > > "config file" that contains executable code. > > > > > Is seccomp config considered as "contains executable code", seccomp > > config is translated into bpf, so maybe yes ? but bpf is running in > > the kernel. > > Because seccomp filters alter syscalls, they are similar to code > injection. > > > > > > > I'm still thinking execveat(AT_CHECK) vs faccessat(AT_CHECK) in > > > > different use cases: > > > > > > > > execveat clearly has less code change, but that also means: we can't > > > > add logic specific to exec (i.e. logic that can't be applied to > > > > config) for this part (from do_execveat_common to > > > > security_bprm_creds_for_exec) in future. This would require some > > > > agreement/sign-off, I'm not sure from whom. > > > > > > I'm not sure to follow. We could still add new flags, but for now I > > > don't see use cases. This patch series is not meant to handle all > > > possible "trust checks", only executable code, which makes sense for the > > > kernel. > > > > > I guess the "configfile" discussion is where I get confused, at one > > point, I think this would become a generic "trust checks" api for > > everything related to "generating executable code", e.g. javascript, > > java code, and more. > > We will want to clearly define the scope of execveat(AT_CHECK) > > The line between data and code is blurry. For instance, a configuration > file can impact the execution flow of a program. So, where to draw the > line? > > It might makes sense to follow the kernel and interpreter semantic: if a > file can be executed by the kernel (e.g. ELF binary, file containing a > shebang, or just configured with binfmt_misc), then this should be > considered as executable code. This applies to Bash, Python, > Javascript, NodeJS, PE, PHP... However, we can also make a picture > executable with binfmt_misc. So, again, where to draw the line? > > I'd recommend to think about interaction with the outside, through > function calls, IPCs, syscalls... For instance, "running" an image > should not lead to reading or writing to arbitrary files, or accessing > the network, but in practice it is legitimate for some file formats... > PostScript is a programming language, but mostly used to draw pictures. > So, again, where to draw the line? > The javascript is run by browser and java code by java runtime, do they meet the criteria? they do not interact with the kernel directly, however they might have the same "executable" characteristics and the app might not want them to be put into non-exec mount. If the answer is yes, they can also use execveat(AT_CHECK), the next question is: does it make sense for javacript/java code to go through execveat() code path, allocate bprm, etc ? (I don't have answer, maybe it is) > We should follow the principle of least astonishment. What most users > would expect? This should follow the *common usage* of executable > files. At the end, the script interpreters will be patched by security > folks for security reasons. I think the right question to ask should > be: could this file format be (ab)used to leak or modify arbitrary > files, or to perform arbitrary syscalls? If the answer is yes, then it > should be checked for executability. Of course, this excludes bugs > exploited in the file format parser. > > I'll extend the next patch series with this rationale. >
On Fri, Jul 19, 2024 at 8:04 AM Mickaël Salaün <mic@digikod.net> wrote: > > On Fri, Jul 19, 2024 at 07:16:55AM -0700, Jeff Xu wrote: > > On Fri, Jul 19, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote: > > > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote: > > > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote: > > > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > > > > > > > > allowed for execution. The main use case is for script interpreters and > > > > > > > > > dynamic linkers to check execution permission according to the kernel's > > > > > > > > > security policy. Another use case is to add context to access logs e.g., > > > > > > > > > which script (instead of interpreter) accessed a file. As any > > > > > > > > > executable code, scripts could also use this check [1]. > > > > > > > > > > > > > > > > > > This is different than faccessat(2) which only checks file access > > > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit, > > > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials). > > > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a > > > > > > > > > real execution, user space gets the same error codes. > > > > > > > > > > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the > > > > > > > > exec, shared object, script and config file (such as seccomp config), > > > > > > > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make > > > > > > > > sure it passes AT_CHECK, before loading it into memory. > > > > > > > > > > > > > > All ELF dependencies are opened and checked with open_exec(), which > > > > > > > perform the main executability checks (with the __FMODE_EXEC flag). > > > > > > > Did I miss something? > > > > > > > > > > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel. > > > > > > The app can choose its own dynamic linker path during build, (maybe > > > > > > even statically link one ?) This is another reason that relying on a > > > > > > userspace only is not enough. > > > > > > > > > > The kernel calls open_exec() on all dependencies, including > > > > > ld-linux-x86-64.so.2, so these files are checked for executability too. > > > > > > > > > This might not be entirely true. iiuc, kernel calls open_exec for > > > > open_exec for interpreter, but not all its dependency (e.g. libc.so.6) > > > > > > Correct, the dynamic linker is in charge of that, which is why it must > > > be enlighten with execveat+AT_CHECK and securebits checks. > > > > > > > load_elf_binary() { > > > > interpreter = open_exec(elf_interpreter); > > > > } > > > > > > > > libc.so.6 is opened and mapped by dynamic linker. > > > > so the call sequence is: > > > > execve(a.out) > > > > - open exec(a.out) > > > > - security_bprm_creds(a.out) > > > > - open the exec(ld.so) > > > > - call open_exec() for interruptor (ld.so) > > > > - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through > > > > the same check and code path as libc.so below ? > > > > > > open_exec() checks are enough. LSMs can use this information (open + > > > __FMODE_EXEC) if needed. execveat+AT_CHECK is only a user space > > > request. > > > > > Then the ld.so doesn't go through the same security_bprm_creds() check > > as other .so. > > Indeed, but... > My point is: we will want all the .so going through the same code path, so security_ functions are called consistently across all the objects, And in the future, if we want to develop additional LSM functionality based on AT_CHECK, it will be applied to all objects. Another thing to consider is: we are asking userspace to make additional syscall before loading the file into memory/get executed, there is a possibility for future expansion of the mechanism, without asking user space to add another syscall again. I m still not convinced yet that execveat(AT_CHECK) fits more than faccessat(AT_CHECK) > > > > As my previous email, the ChromeOS LSM restricts executable mfd > > through security_bprm_creds(), the end result is that ld.so can still > > be executable memfd, but not other .so. > > The chromeOS LSM can check that with the security_file_open() hook and > the __FMODE_EXEC flag, see Landlock's implementation. I think this > should be the only hook implementation that chromeOS LSM needs to add. > > > > > One way to address this is to refactor the necessary code from > > execveat() code patch, and make it available to call from both kernel > > and execveat() code paths., but if we do that, we might as well use > > faccessat2(AT_CHECK) > > That's why I think it makes sense to rely on the existing __FMODE_EXEC > information. > > > > > > > > > - transfer the control to ld.so) > > > > - ld.so open (libc.so) > > > > - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch, > > > > require dynamic linker change. > > > > - ld.so mmap(libc.so,rx) > > > > > > Explaining these steps is useful. I'll include that in the next patch > > > series.
On Fri, Jul 19, 2024 at 08:12:37AM -0700, Jeff Xu wrote: > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote: > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote: > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > > > > > allowed for execution. The main use case is for script interpreters and > > > > > > dynamic linkers to check execution permission according to the kernel's > > > > > > security policy. Another use case is to add context to access logs e.g., > > > > > > which script (instead of interpreter) accessed a file. As any > > > > > > executable code, scripts could also use this check [1]. > > > > > > > > > > > > This is different than faccessat(2) which only checks file access > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit, > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials). > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a > > > > > > real execution, user space gets the same error codes. > > > > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the > > > > > exec, shared object, script and config file (such as seccomp config), > > > > > > > > "config file" that contains executable code. > > > > > > > Is seccomp config considered as "contains executable code", seccomp > > > config is translated into bpf, so maybe yes ? but bpf is running in > > > the kernel. > > > > Because seccomp filters alter syscalls, they are similar to code > > injection. > > > > > > > > > > I'm still thinking execveat(AT_CHECK) vs faccessat(AT_CHECK) in > > > > > different use cases: > > > > > > > > > > execveat clearly has less code change, but that also means: we can't > > > > > add logic specific to exec (i.e. logic that can't be applied to > > > > > config) for this part (from do_execveat_common to > > > > > security_bprm_creds_for_exec) in future. This would require some > > > > > agreement/sign-off, I'm not sure from whom. > > > > > > > > I'm not sure to follow. We could still add new flags, but for now I > > > > don't see use cases. This patch series is not meant to handle all > > > > possible "trust checks", only executable code, which makes sense for the > > > > kernel. > > > > > > > I guess the "configfile" discussion is where I get confused, at one > > > point, I think this would become a generic "trust checks" api for > > > everything related to "generating executable code", e.g. javascript, > > > java code, and more. > > > We will want to clearly define the scope of execveat(AT_CHECK) > > > > The line between data and code is blurry. For instance, a configuration > > file can impact the execution flow of a program. So, where to draw the > > line? > > > > It might makes sense to follow the kernel and interpreter semantic: if a > > file can be executed by the kernel (e.g. ELF binary, file containing a > > shebang, or just configured with binfmt_misc), then this should be > > considered as executable code. This applies to Bash, Python, > > Javascript, NodeJS, PE, PHP... However, we can also make a picture > > executable with binfmt_misc. So, again, where to draw the line? > > > > I'd recommend to think about interaction with the outside, through > > function calls, IPCs, syscalls... For instance, "running" an image > > should not lead to reading or writing to arbitrary files, or accessing > > the network, but in practice it is legitimate for some file formats... > > PostScript is a programming language, but mostly used to draw pictures. > > So, again, where to draw the line? > > > The javascript is run by browser and java code by java runtime, do > they meet the criteria? they do not interact with the kernel directly, > however they might have the same "executable" characteristics and the > app might not want them to be put into non-exec mount. > > If the answer is yes, they can also use execveat(AT_CHECK), the next > question is: does it make sense for javacript/java code to go through > execveat() code path, allocate bprm, etc ? (I don't have answer, maybe > it is) Java and NodeJS can do arbitrary syscalls (through their runtime) and they can access arbitrary files, so according to my below comment, yes they should be managed as potentially dangerous executable code. The question should be: is this code trusted? Most of the time it is not, hence the security model of web browser and their heavy use of sandboxing. So no, I don't think it would make sense to check this kind of code more than what the browser already do. I'll talk about this use case in the next patch series. > > > We should follow the principle of least astonishment. What most users > > would expect? This should follow the *common usage* of executable > > files. At the end, the script interpreters will be patched by security > > folks for security reasons. I think the right question to ask should > > be: could this file format be (ab)used to leak or modify arbitrary > > files, or to perform arbitrary syscalls? If the answer is yes, then it > > should be checked for executability. Of course, this excludes bugs > > exploited in the file format parser. > > > > I'll extend the next patch series with this rationale. > > >
On Fri, Jul 19, 2024 at 8:31 AM Mickaël Salaün <mic@digikod.net> wrote: > > On Fri, Jul 19, 2024 at 08:12:37AM -0700, Jeff Xu wrote: > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote: > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote: > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > > > > > > allowed for execution. The main use case is for script interpreters and > > > > > > > dynamic linkers to check execution permission according to the kernel's > > > > > > > security policy. Another use case is to add context to access logs e.g., > > > > > > > which script (instead of interpreter) accessed a file. As any > > > > > > > executable code, scripts could also use this check [1]. > > > > > > > > > > > > > > This is different than faccessat(2) which only checks file access > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit, > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials). > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a > > > > > > > real execution, user space gets the same error codes. > > > > > > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the > > > > > > exec, shared object, script and config file (such as seccomp config), > > > > > > > > > > "config file" that contains executable code. > > > > > > > > > Is seccomp config considered as "contains executable code", seccomp > > > > config is translated into bpf, so maybe yes ? but bpf is running in > > > > the kernel. > > > > > > Because seccomp filters alter syscalls, they are similar to code > > > injection. > > > > > > > > > > > > > I'm still thinking execveat(AT_CHECK) vs faccessat(AT_CHECK) in > > > > > > different use cases: > > > > > > > > > > > > execveat clearly has less code change, but that also means: we can't > > > > > > add logic specific to exec (i.e. logic that can't be applied to > > > > > > config) for this part (from do_execveat_common to > > > > > > security_bprm_creds_for_exec) in future. This would require some > > > > > > agreement/sign-off, I'm not sure from whom. > > > > > > > > > > I'm not sure to follow. We could still add new flags, but for now I > > > > > don't see use cases. This patch series is not meant to handle all > > > > > possible "trust checks", only executable code, which makes sense for the > > > > > kernel. > > > > > > > > > I guess the "configfile" discussion is where I get confused, at one > > > > point, I think this would become a generic "trust checks" api for > > > > everything related to "generating executable code", e.g. javascript, > > > > java code, and more. > > > > We will want to clearly define the scope of execveat(AT_CHECK) > > > > > > The line between data and code is blurry. For instance, a configuration > > > file can impact the execution flow of a program. So, where to draw the > > > line? > > > > > > It might makes sense to follow the kernel and interpreter semantic: if a > > > file can be executed by the kernel (e.g. ELF binary, file containing a > > > shebang, or just configured with binfmt_misc), then this should be > > > considered as executable code. This applies to Bash, Python, > > > Javascript, NodeJS, PE, PHP... However, we can also make a picture > > > executable with binfmt_misc. So, again, where to draw the line? > > > > > > I'd recommend to think about interaction with the outside, through > > > function calls, IPCs, syscalls... For instance, "running" an image > > > should not lead to reading or writing to arbitrary files, or accessing > > > the network, but in practice it is legitimate for some file formats... > > > PostScript is a programming language, but mostly used to draw pictures. > > > So, again, where to draw the line? > > > > > The javascript is run by browser and java code by java runtime, do > > they meet the criteria? they do not interact with the kernel directly, > > however they might have the same "executable" characteristics and the > > app might not want them to be put into non-exec mount. > > > > If the answer is yes, they can also use execveat(AT_CHECK), the next > > question is: does it make sense for javacript/java code to go through > > execveat() code path, allocate bprm, etc ? (I don't have answer, maybe > > it is) > > Java and NodeJS can do arbitrary syscalls (through their runtime) and > they can access arbitrary files, so according to my below comment, yes > they should be managed as potentially dangerous executable code. > > The question should be: is this code trusted? Most of the time it is > not, hence the security model of web browser and their heavy use of > sandboxing. So no, I don't think it would make sense to check this kind > of code more than what the browser already do. > If I understand you correctly, Java/NodeJS won't use execveat(AT_CHECK), we will leave that work to the web browser/java runtime's sandboxer. This is good because the scope is more narrow/clear. Thanks -Jeff > I'll talk about this use case in the next patch series. > > > > > > We should follow the principle of least astonishment. What most users > > > would expect? This should follow the *common usage* of executable > > > files. At the end, the script interpreters will be patched by security > > > folks for security reasons. I think the right question to ask should > > > be: could this file format be (ab)used to leak or modify arbitrary > > > files, or to perform arbitrary syscalls? If the answer is yes, then it > > > should be checked for executability. Of course, this excludes bugs > > > exploited in the file format parser. > > > > > > I'll extend the next patch series with this rationale. > > > > >
> On Jul 18, 2024, at 8:22 PM, Mickaël Salaün <mic@digikod.net> wrote: > > On Thu, Jul 18, 2024 at 09:02:56AM +0800, Andy Lutomirski wrote: >>>> On Jul 17, 2024, at 6:01 PM, Mickaël Salaün <mic@digikod.net> wrote: >>> >>> On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote: >>>>> On 17/07/2024 07:33, Jeff Xu wrote: >>>>> Consider those cases: I think: >>>>> a> relying purely on userspace for enforcement does't seem to be >>>>> effective, e.g. it is trivial to call open(), then mmap() it into >>>>> executable memory. >>>> >>>> If there's a way to do this without running executable code that had to pass >>>> a previous execveat() check, then yeah, it's not effective (e.g. a Python >>>> interpreter that *doesn't* enforce execveat() is a trivial way to do it). >>>> >>>> Once arbitrary code is running, all bets are off. So long as all arbitrary >>>> code is being checked itself, it's allowed to do things that would bypass >>>> later checks (and it's up to whoever audited it in the first place to >>>> prevent this by not giving it the special mark that allows it to pass the >>>> check). >>> >>> Exactly. As explained in the patches, one crucial prerequisite is that >>> the executable code is trusted, and the system must provide integrity >>> guarantees. We cannot do anything without that. This patches series is >>> a building block to fix a blind spot on Linux systems to be able to >>> fully control executability. >> >> Circling back to my previous comment (did that ever get noticed?), I > > Yes, I replied to your comments. Did I miss something? I missed that email in the pile, sorry. I’ll reply separately. > >> don’t think this is quite right: >> >> https://lore.kernel.org/all/CALCETrWYu=PYJSgyJ-vaa+3BGAry8Jo8xErZLiGR3U5h6+U0tA@mail.gmail.com/ >> >> On a basic system configuration, a given path either may or may not be >> executed. And maybe that path has some integrity check (dm-verity, >> etc). So the kernel should tell the interpreter/loader whether the >> target may be executed. All fine. >> >> But I think the more complex cases are more interesting, and the >> “execute a program” process IS NOT BINARY. An attempt to execute can >> be rejected outright, or it can be allowed *with a change to creds or >> security context*. It would be entirely reasonable to have a policy >> that allows execution of non-integrity-checked files but in a very >> locked down context only. > > I guess you mean to transition to a sandbox when executing an untrusted > file. This is a good idea. I talked about role transition in the > patch's description: > > With the information that a script interpreter is about to interpret a > script, an LSM security policy can adjust caller's access rights or log > execution request as for native script execution (e.g. role transition). > This is possible thanks to the call to security_bprm_creds_for_exec(). … > This patch series brings the minimal building blocks to have a > consistent execution environment. Role transitions for script execution > are left to LSMs. For instance, we could extend Landlock to > automatically sandbox untrusted scripts. I’m not really convinced. There’s more to building an API that enables LSM hooks than merely sticking the hook somewhere in kernel code. It needs to be a defined API. If you call an operation “check”, then people will expect it to check, not to change the caller’s credentials. And people will mess it up in both directions (e.g. callers will call it and then open try to load some library that they should have loaded first, or callers will call it and forget to close fds first. And there should probably be some interaction with dumpable as well. If I “check” a file for executability, that should not suddenly allow someone to ptrace me? And callers need to know to exit on failure, not carry on. More concretely, a runtime that fully opts in to this may well "check" multiple things. For example, if I do: $ ld.so ~/.local/bin/some_program (i.e. I literally execve ld.so) then ld.so will load several things: ~/.local/bin/some_program libc.so other random DSOs, some of which may well be in my home directory And for all ld.so knows, some_program is actually an interpreter and will "check" something else. And the LSMs have absolutely no clue what's what. So I think for this to work right, the APIs need to be a lot more expressive and explicit: check_library(fd to libc.so); <-- does not transition or otherwise drop privs check_transition_main_program(fd to ~/.local/bin/some_program); <-- may drop privs and if some_program is really an interpreter, then it will do: check_library(fd to some thing imported by the script); check_transition_main_program(fd to the actual script); And maybe that takes a parameter that gets run eval-style: check_unsafe_user_script("actual contents of snippet"); The actual spelling of all this doesn't matter so much. But the user code and the kernel code need to be on the same page as to what the user program is doing and what it's asking the kernel program to do. --Andy
On Sat Jul 20, 2024 at 4:59 AM EEST, Andy Lutomirski wrote: > > On Jul 18, 2024, at 8:22 PM, Mickaël Salaün <mic@digikod.net> wrote: > > > > On Thu, Jul 18, 2024 at 09:02:56AM +0800, Andy Lutomirski wrote: > >>>> On Jul 17, 2024, at 6:01 PM, Mickaël Salaün <mic@digikod.net> wrote: > >>> > >>> On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote: > >>>>> On 17/07/2024 07:33, Jeff Xu wrote: > >>>>> Consider those cases: I think: > >>>>> a> relying purely on userspace for enforcement does't seem to be > >>>>> effective, e.g. it is trivial to call open(), then mmap() it into > >>>>> executable memory. > >>>> > >>>> If there's a way to do this without running executable code that had to pass > >>>> a previous execveat() check, then yeah, it's not effective (e.g. a Python > >>>> interpreter that *doesn't* enforce execveat() is a trivial way to do it). > >>>> > >>>> Once arbitrary code is running, all bets are off. So long as all arbitrary > >>>> code is being checked itself, it's allowed to do things that would bypass > >>>> later checks (and it's up to whoever audited it in the first place to > >>>> prevent this by not giving it the special mark that allows it to pass the > >>>> check). > >>> > >>> Exactly. As explained in the patches, one crucial prerequisite is that > >>> the executable code is trusted, and the system must provide integrity > >>> guarantees. We cannot do anything without that. This patches series is > >>> a building block to fix a blind spot on Linux systems to be able to > >>> fully control executability. > >> > >> Circling back to my previous comment (did that ever get noticed?), I > > > > Yes, I replied to your comments. Did I miss something? > > I missed that email in the pile, sorry. I’ll reply separately. > > > > >> don’t think this is quite right: > >> > >> https://lore.kernel.org/all/CALCETrWYu=PYJSgyJ-vaa+3BGAry8Jo8xErZLiGR3U5h6+U0tA@mail.gmail.com/ > >> > >> On a basic system configuration, a given path either may or may not be > >> executed. And maybe that path has some integrity check (dm-verity, > >> etc). So the kernel should tell the interpreter/loader whether the > >> target may be executed. All fine. > >> > >> But I think the more complex cases are more interesting, and the > >> “execute a program” process IS NOT BINARY. An attempt to execute can > >> be rejected outright, or it can be allowed *with a change to creds or > >> security context*. It would be entirely reasonable to have a policy > >> that allows execution of non-integrity-checked files but in a very > >> locked down context only. > > > > I guess you mean to transition to a sandbox when executing an untrusted > > file. This is a good idea. I talked about role transition in the > > patch's description: > > > > With the information that a script interpreter is about to interpret a > > script, an LSM security policy can adjust caller's access rights or log > > execution request as for native script execution (e.g. role transition). > > This is possible thanks to the call to security_bprm_creds_for_exec(). > > … > > > This patch series brings the minimal building blocks to have a > > consistent execution environment. Role transitions for script execution > > are left to LSMs. For instance, we could extend Landlock to > > automatically sandbox untrusted scripts. > > I’m not really convinced. There’s more to building an API that > enables LSM hooks than merely sticking the hook somewhere in kernel > code. It needs to be a defined API. If you call an operation “check”, > then people will expect it to check, not to change the caller’s > credentials. And people will mess it up in both directions (e.g. > callers will call it and then open try to load some library that they > should have loaded first, or callers will call it and forget to close > fds first. > > And there should probably be some interaction with dumpable as well. > If I “check” a file for executability, that should not suddenly allow > someone to ptrace me? > > And callers need to know to exit on failure, not carry on. > > > More concretely, a runtime that fully opts in to this may well "check" > multiple things. For example, if I do: > > $ ld.so ~/.local/bin/some_program (i.e. I literally execve ld.so) > > then ld.so will load several things: > > ~/.local/bin/some_program > libc.so > other random DSOs, some of which may well be in my home directory What would really help to comprehend this patch set would be a set of test scripts, preferably something that you can run easily with BuildRoot or similar. Scripts would demonstrate the use cases for the patch set. Then it would be easier to develop scripts that would underline the corner cases. I would keep all this out of kselftest shenanigans for now. I feel that the patch set is hovering in abstractions with examples that you cannot execute. I added the patches to standard test CI hack: https://codeberg.org/jarkko/linux-tpmdd-test But after I booted up a kernel I had no idea what to do with it. And all this lenghty discussion makes it even more confusing. Please find some connection to the real world before sending any new version of this (e.g. via test scripts). I think this should not be pulled before almost anyone doing kernel dev can comprehend the "gist" at least in some reasonable level. BR, Jarkko
On Fri, Jul 19, 2024 at 08:27:18AM -0700, Jeff Xu wrote: > On Fri, Jul 19, 2024 at 8:04 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > On Fri, Jul 19, 2024 at 07:16:55AM -0700, Jeff Xu wrote: > > > On Fri, Jul 19, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote: > > > > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote: > > > > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote: > > > > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > > > > > > > > > allowed for execution. The main use case is for script interpreters and > > > > > > > > > > dynamic linkers to check execution permission according to the kernel's > > > > > > > > > > security policy. Another use case is to add context to access logs e.g., > > > > > > > > > > which script (instead of interpreter) accessed a file. As any > > > > > > > > > > executable code, scripts could also use this check [1]. > > > > > > > > > > > > > > > > > > > > This is different than faccessat(2) which only checks file access > > > > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit, > > > > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials). > > > > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a > > > > > > > > > > real execution, user space gets the same error codes. > > > > > > > > > > > > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the > > > > > > > > > exec, shared object, script and config file (such as seccomp config), > > > > > > > > > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make > > > > > > > > > sure it passes AT_CHECK, before loading it into memory. > > > > > > > > > > > > > > > > All ELF dependencies are opened and checked with open_exec(), which > > > > > > > > perform the main executability checks (with the __FMODE_EXEC flag). > > > > > > > > Did I miss something? > > > > > > > > > > > > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel. > > > > > > > The app can choose its own dynamic linker path during build, (maybe > > > > > > > even statically link one ?) This is another reason that relying on a > > > > > > > userspace only is not enough. > > > > > > > > > > > > The kernel calls open_exec() on all dependencies, including > > > > > > ld-linux-x86-64.so.2, so these files are checked for executability too. > > > > > > > > > > > This might not be entirely true. iiuc, kernel calls open_exec for > > > > > open_exec for interpreter, but not all its dependency (e.g. libc.so.6) > > > > > > > > Correct, the dynamic linker is in charge of that, which is why it must > > > > be enlighten with execveat+AT_CHECK and securebits checks. > > > > > > > > > load_elf_binary() { > > > > > interpreter = open_exec(elf_interpreter); > > > > > } > > > > > > > > > > libc.so.6 is opened and mapped by dynamic linker. > > > > > so the call sequence is: > > > > > execve(a.out) > > > > > - open exec(a.out) > > > > > - security_bprm_creds(a.out) > > > > > - open the exec(ld.so) > > > > > - call open_exec() for interruptor (ld.so) > > > > > - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through > > > > > the same check and code path as libc.so below ? > > > > > > > > open_exec() checks are enough. LSMs can use this information (open + > > > > __FMODE_EXEC) if needed. execveat+AT_CHECK is only a user space > > > > request. > > > > > > > Then the ld.so doesn't go through the same security_bprm_creds() check > > > as other .so. > > > > Indeed, but... > > > My point is: we will want all the .so going through the same code > path, so security_ functions are called consistently across all the > objects, And in the future, if we want to develop additional LSM > functionality based on AT_CHECK, it will be applied to all objects. I'll extend the doc to encourage LSMs to check for __FMODE_EXEC, which already is the common security check for all executable dependencies. As extra information, they can get explicit requests by looking at execveat+AT_CHECK call. > > Another thing to consider is: we are asking userspace to make > additional syscall before loading the file into memory/get executed, > there is a possibility for future expansion of the mechanism, without > asking user space to add another syscall again. AT_CHECK is defined with a specific semantic. Other mechanisms (e.g. LSM policies) could enforce other restrictions following the same semantic. We need to keep in mind backward compatibility. > > I m still not convinced yet that execveat(AT_CHECK) fits more than > faccessat(AT_CHECK) faccessat2(2) is dedicated to file permission/attribute check. execveat(2) is dedicated to execution, which is a superset of file permission for executability, plus other checks (e.g. noexec). > > > > > > > > As my previous email, the ChromeOS LSM restricts executable mfd > > > through security_bprm_creds(), the end result is that ld.so can still > > > be executable memfd, but not other .so. > > > > The chromeOS LSM can check that with the security_file_open() hook and > > the __FMODE_EXEC flag, see Landlock's implementation. I think this > > should be the only hook implementation that chromeOS LSM needs to add. > > > > > > > > One way to address this is to refactor the necessary code from > > > execveat() code patch, and make it available to call from both kernel > > > and execveat() code paths., but if we do that, we might as well use > > > faccessat2(AT_CHECK) > > > > That's why I think it makes sense to rely on the existing __FMODE_EXEC > > information. > > > > > > > > > > > > > - transfer the control to ld.so) > > > > > - ld.so open (libc.so) > > > > > - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch, > > > > > require dynamic linker change. > > > > > - ld.so mmap(libc.so,rx) > > > > > > > > Explaining these steps is useful. I'll include that in the next patch > > > > series. >
On Fri, Jul 19, 2024 at 10:36:01AM -0700, Jeff Xu wrote: > On Fri, Jul 19, 2024 at 8:31 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > On Fri, Jul 19, 2024 at 08:12:37AM -0700, Jeff Xu wrote: > > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote: > > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote: > > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > > > > > > > allowed for execution. The main use case is for script interpreters and > > > > > > > > dynamic linkers to check execution permission according to the kernel's > > > > > > > > security policy. Another use case is to add context to access logs e.g., > > > > > > > > which script (instead of interpreter) accessed a file. As any > > > > > > > > executable code, scripts could also use this check [1]. > > > > > > > > > > > > > > > > This is different than faccessat(2) which only checks file access > > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit, > > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials). > > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a > > > > > > > > real execution, user space gets the same error codes. > > > > > > > > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the > > > > > > > exec, shared object, script and config file (such as seccomp config), > > > > > > > > > > > > "config file" that contains executable code. > > > > > > > > > > > Is seccomp config considered as "contains executable code", seccomp > > > > > config is translated into bpf, so maybe yes ? but bpf is running in > > > > > the kernel. > > > > > > > > Because seccomp filters alter syscalls, they are similar to code > > > > injection. > > > > > > > > > > > > > > > > I'm still thinking execveat(AT_CHECK) vs faccessat(AT_CHECK) in > > > > > > > different use cases: > > > > > > > > > > > > > > execveat clearly has less code change, but that also means: we can't > > > > > > > add logic specific to exec (i.e. logic that can't be applied to > > > > > > > config) for this part (from do_execveat_common to > > > > > > > security_bprm_creds_for_exec) in future. This would require some > > > > > > > agreement/sign-off, I'm not sure from whom. > > > > > > > > > > > > I'm not sure to follow. We could still add new flags, but for now I > > > > > > don't see use cases. This patch series is not meant to handle all > > > > > > possible "trust checks", only executable code, which makes sense for the > > > > > > kernel. > > > > > > > > > > > I guess the "configfile" discussion is where I get confused, at one > > > > > point, I think this would become a generic "trust checks" api for > > > > > everything related to "generating executable code", e.g. javascript, > > > > > java code, and more. > > > > > We will want to clearly define the scope of execveat(AT_CHECK) > > > > > > > > The line between data and code is blurry. For instance, a configuration > > > > file can impact the execution flow of a program. So, where to draw the > > > > line? > > > > > > > > It might makes sense to follow the kernel and interpreter semantic: if a > > > > file can be executed by the kernel (e.g. ELF binary, file containing a > > > > shebang, or just configured with binfmt_misc), then this should be > > > > considered as executable code. This applies to Bash, Python, > > > > Javascript, NodeJS, PE, PHP... However, we can also make a picture > > > > executable with binfmt_misc. So, again, where to draw the line? > > > > > > > > I'd recommend to think about interaction with the outside, through > > > > function calls, IPCs, syscalls... For instance, "running" an image > > > > should not lead to reading or writing to arbitrary files, or accessing > > > > the network, but in practice it is legitimate for some file formats... > > > > PostScript is a programming language, but mostly used to draw pictures. > > > > So, again, where to draw the line? > > > > > > > The javascript is run by browser and java code by java runtime, do > > > they meet the criteria? they do not interact with the kernel directly, > > > however they might have the same "executable" characteristics and the > > > app might not want them to be put into non-exec mount. > > > > > > If the answer is yes, they can also use execveat(AT_CHECK), the next > > > question is: does it make sense for javacript/java code to go through > > > execveat() code path, allocate bprm, etc ? (I don't have answer, maybe > > > it is) > > > > Java and NodeJS can do arbitrary syscalls (through their runtime) and > > they can access arbitrary files, so according to my below comment, yes > > they should be managed as potentially dangerous executable code. > > > > The question should be: is this code trusted? Most of the time it is > > not, hence the security model of web browser and their heavy use of > > sandboxing. So no, I don't think it would make sense to check this kind > > of code more than what the browser already do. > > > > If I understand you correctly, Java/NodeJS won't use > execveat(AT_CHECK), we will leave that work to the web browser/java > runtime's sandboxer. > This is good because the scope is more narrow/clear. Yes for browser's sandboxes because the code comes from the network (i.e. not authenticated at the kernel level, and mostly untrusted). For standalone Java applications (stored in the filesystem), the Java runtime(s) should be patched as other script interpreters. > > Thanks > -Jeff > > > I'll talk about this use case in the next patch series. > > > > > > > > > We should follow the principle of least astonishment. What most users > > > > would expect? This should follow the *common usage* of executable > > > > files. At the end, the script interpreters will be patched by security > > > > folks for security reasons. I think the right question to ask should > > > > be: could this file format be (ab)used to leak or modify arbitrary > > > > files, or to perform arbitrary syscalls? If the answer is yes, then it > > > > should be checked for executability. Of course, this excludes bugs > > > > exploited in the file format parser. > > > > > > > > I'll extend the next patch series with this rationale. > > > > > > >
On Sat, Jul 20, 2024 at 09:59:33AM +0800, Andy Lutomirski wrote: > > On Jul 18, 2024, at 8:22 PM, Mickaël Salaün <mic@digikod.net> wrote: > > > > On Thu, Jul 18, 2024 at 09:02:56AM +0800, Andy Lutomirski wrote: > >>>> On Jul 17, 2024, at 6:01 PM, Mickaël Salaün <mic@digikod.net> wrote: > >>> > >>> On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote: > >>>>> On 17/07/2024 07:33, Jeff Xu wrote: > >>>>> Consider those cases: I think: > >>>>> a> relying purely on userspace for enforcement does't seem to be > >>>>> effective, e.g. it is trivial to call open(), then mmap() it into > >>>>> executable memory. > >>>> > >>>> If there's a way to do this without running executable code that had to pass > >>>> a previous execveat() check, then yeah, it's not effective (e.g. a Python > >>>> interpreter that *doesn't* enforce execveat() is a trivial way to do it). > >>>> > >>>> Once arbitrary code is running, all bets are off. So long as all arbitrary > >>>> code is being checked itself, it's allowed to do things that would bypass > >>>> later checks (and it's up to whoever audited it in the first place to > >>>> prevent this by not giving it the special mark that allows it to pass the > >>>> check). > >>> > >>> Exactly. As explained in the patches, one crucial prerequisite is that > >>> the executable code is trusted, and the system must provide integrity > >>> guarantees. We cannot do anything without that. This patches series is > >>> a building block to fix a blind spot on Linux systems to be able to > >>> fully control executability. > >> > >> Circling back to my previous comment (did that ever get noticed?), I > > > > Yes, I replied to your comments. Did I miss something? > > I missed that email in the pile, sorry. I’ll reply separately. > > > > >> don’t think this is quite right: > >> > >> https://lore.kernel.org/all/CALCETrWYu=PYJSgyJ-vaa+3BGAry8Jo8xErZLiGR3U5h6+U0tA@mail.gmail.com/ > >> > >> On a basic system configuration, a given path either may or may not be > >> executed. And maybe that path has some integrity check (dm-verity, > >> etc). So the kernel should tell the interpreter/loader whether the > >> target may be executed. All fine. > >> > >> But I think the more complex cases are more interesting, and the > >> “execute a program” process IS NOT BINARY. An attempt to execute can > >> be rejected outright, or it can be allowed *with a change to creds or > >> security context*. It would be entirely reasonable to have a policy > >> that allows execution of non-integrity-checked files but in a very > >> locked down context only. > > > > I guess you mean to transition to a sandbox when executing an untrusted > > file. This is a good idea. I talked about role transition in the > > patch's description: > > > > With the information that a script interpreter is about to interpret a > > script, an LSM security policy can adjust caller's access rights or log > > execution request as for native script execution (e.g. role transition). > > This is possible thanks to the call to security_bprm_creds_for_exec(). > > … > > > This patch series brings the minimal building blocks to have a > > consistent execution environment. Role transitions for script execution > > are left to LSMs. For instance, we could extend Landlock to > > automatically sandbox untrusted scripts. > > I’m not really convinced. There’s more to building an API that > enables LSM hooks than merely sticking the hook somewhere in kernel > code. It needs to be a defined API. If you call an operation “check”, > then people will expect it to check, not to change the caller’s > credentials. And people will mess it up in both directions (e.g. > callers will call it and then open try to load some library that they > should have loaded first, or callers will call it and forget to close > fds first. > > And there should probably be some interaction with dumpable as well. > If I “check” a file for executability, that should not suddenly allow > someone to ptrace me? > > And callers need to know to exit on failure, not carry on. > > > More concretely, a runtime that fully opts in to this may well "check" > multiple things. For example, if I do: > > $ ld.so ~/.local/bin/some_program (i.e. I literally execve ld.so) > > then ld.so will load several things: > > ~/.local/bin/some_program > libc.so > other random DSOs, some of which may well be in my home directory > > And for all ld.so knows, some_program is actually an interpreter and > will "check" something else. And the LSMs have absolutely no clue > what's what. So I think for this to work right, the APIs need to be a > lot more expressive and explicit: > > check_library(fd to libc.so); <-- does not transition or otherwise drop privs > check_transition_main_program(fd to ~/.local/bin/some_program); <-- > may drop privs > > and if some_program is really an interpreter, then it will do: > > check_library(fd to some thing imported by the script); > check_transition_main_program(fd to the actual script); > > And maybe that takes a parameter that gets run eval-style: > > check_unsafe_user_script("actual contents of snippet"); > > The actual spelling of all this doesn't matter so much. But the user > code and the kernel code need to be on the same page as to what the > user program is doing and what it's asking the kernel program to do. I agree. I'll remove any references to "role transition". This kind of feature should come with something like getpeercon/setexeccon(3). > > --Andy >
On Sat, Jul 20, 2024 at 02:43:41PM +0300, Jarkko Sakkinen wrote: > On Sat Jul 20, 2024 at 4:59 AM EEST, Andy Lutomirski wrote: > > > On Jul 18, 2024, at 8:22 PM, Mickaël Salaün <mic@digikod.net> wrote: > > > > > > On Thu, Jul 18, 2024 at 09:02:56AM +0800, Andy Lutomirski wrote: > > >>>> On Jul 17, 2024, at 6:01 PM, Mickaël Salaün <mic@digikod.net> wrote: > > >>> > > >>> On Wed, Jul 17, 2024 at 09:26:22AM +0100, Steve Dower wrote: > > >>>>> On 17/07/2024 07:33, Jeff Xu wrote: > > >>>>> Consider those cases: I think: > > >>>>> a> relying purely on userspace for enforcement does't seem to be > > >>>>> effective, e.g. it is trivial to call open(), then mmap() it into > > >>>>> executable memory. > > >>>> > > >>>> If there's a way to do this without running executable code that had to pass > > >>>> a previous execveat() check, then yeah, it's not effective (e.g. a Python > > >>>> interpreter that *doesn't* enforce execveat() is a trivial way to do it). > > >>>> > > >>>> Once arbitrary code is running, all bets are off. So long as all arbitrary > > >>>> code is being checked itself, it's allowed to do things that would bypass > > >>>> later checks (and it's up to whoever audited it in the first place to > > >>>> prevent this by not giving it the special mark that allows it to pass the > > >>>> check). > > >>> > > >>> Exactly. As explained in the patches, one crucial prerequisite is that > > >>> the executable code is trusted, and the system must provide integrity > > >>> guarantees. We cannot do anything without that. This patches series is > > >>> a building block to fix a blind spot on Linux systems to be able to > > >>> fully control executability. > > >> > > >> Circling back to my previous comment (did that ever get noticed?), I > > > > > > Yes, I replied to your comments. Did I miss something? > > > > I missed that email in the pile, sorry. I’ll reply separately. > > > > > > > >> don’t think this is quite right: > > >> > > >> https://lore.kernel.org/all/CALCETrWYu=PYJSgyJ-vaa+3BGAry8Jo8xErZLiGR3U5h6+U0tA@mail.gmail.com/ > > >> > > >> On a basic system configuration, a given path either may or may not be > > >> executed. And maybe that path has some integrity check (dm-verity, > > >> etc). So the kernel should tell the interpreter/loader whether the > > >> target may be executed. All fine. > > >> > > >> But I think the more complex cases are more interesting, and the > > >> “execute a program” process IS NOT BINARY. An attempt to execute can > > >> be rejected outright, or it can be allowed *with a change to creds or > > >> security context*. It would be entirely reasonable to have a policy > > >> that allows execution of non-integrity-checked files but in a very > > >> locked down context only. > > > > > > I guess you mean to transition to a sandbox when executing an untrusted > > > file. This is a good idea. I talked about role transition in the > > > patch's description: > > > > > > With the information that a script interpreter is about to interpret a > > > script, an LSM security policy can adjust caller's access rights or log > > > execution request as for native script execution (e.g. role transition). > > > This is possible thanks to the call to security_bprm_creds_for_exec(). > > > > … > > > > > This patch series brings the minimal building blocks to have a > > > consistent execution environment. Role transitions for script execution > > > are left to LSMs. For instance, we could extend Landlock to > > > automatically sandbox untrusted scripts. > > > > I’m not really convinced. There’s more to building an API that > > enables LSM hooks than merely sticking the hook somewhere in kernel > > code. It needs to be a defined API. If you call an operation “check”, > > then people will expect it to check, not to change the caller’s > > credentials. And people will mess it up in both directions (e.g. > > callers will call it and then open try to load some library that they > > should have loaded first, or callers will call it and forget to close > > fds first. > > > > And there should probably be some interaction with dumpable as well. > > If I “check” a file for executability, that should not suddenly allow > > someone to ptrace me? > > > > And callers need to know to exit on failure, not carry on. > > > > > > More concretely, a runtime that fully opts in to this may well "check" > > multiple things. For example, if I do: > > > > $ ld.so ~/.local/bin/some_program (i.e. I literally execve ld.so) > > > > then ld.so will load several things: > > > > ~/.local/bin/some_program > > libc.so > > other random DSOs, some of which may well be in my home directory > > What would really help to comprehend this patch set would be a set of > test scripts, preferably something that you can run easily with > BuildRoot or similar. > > Scripts would demonstrate the use cases for the patch set. Then it > would be easier to develop scripts that would underline the corner > cases. I would keep all this out of kselftest shenanigans for now. I'll include a toy script interpreter with the next patch series. This one was an RFC. > > I feel that the patch set is hovering in abstractions with examples > that you cannot execute. > > I added the patches to standard test CI hack: > > https://codeberg.org/jarkko/linux-tpmdd-test > > But after I booted up a kernel I had no idea what to do with it. And > all this lenghty discussion makes it even more confusing. You can run the tests in the CI. > > Please find some connection to the real world before sending any new > version of this (e.g. via test scripts). I think this should not be > pulled before almost anyone doing kernel dev can comprehend the "gist" > at least in some reasonable level. You'll find in this patch series (cover letter, patch description, and comments) connection to the real world. :) The next patch series should take into account the current discussions. > > BR, Jarkko
On Tue, Jul 23, 2024 at 6:15 AM Mickaël Salaün <mic@digikod.net> wrote: > > On Fri, Jul 19, 2024 at 08:27:18AM -0700, Jeff Xu wrote: > > On Fri, Jul 19, 2024 at 8:04 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > On Fri, Jul 19, 2024 at 07:16:55AM -0700, Jeff Xu wrote: > > > > On Fri, Jul 19, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote: > > > > > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote: > > > > > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote: > > > > > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > > > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > > > > > > > > > > allowed for execution. The main use case is for script interpreters and > > > > > > > > > > > dynamic linkers to check execution permission according to the kernel's > > > > > > > > > > > security policy. Another use case is to add context to access logs e.g., > > > > > > > > > > > which script (instead of interpreter) accessed a file. As any > > > > > > > > > > > executable code, scripts could also use this check [1]. > > > > > > > > > > > > > > > > > > > > > > This is different than faccessat(2) which only checks file access > > > > > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit, > > > > > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials). > > > > > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a > > > > > > > > > > > real execution, user space gets the same error codes. > > > > > > > > > > > > > > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the > > > > > > > > > > exec, shared object, script and config file (such as seccomp config), > > > > > > > > > > > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make > > > > > > > > > > sure it passes AT_CHECK, before loading it into memory. > > > > > > > > > > > > > > > > > > All ELF dependencies are opened and checked with open_exec(), which > > > > > > > > > perform the main executability checks (with the __FMODE_EXEC flag). > > > > > > > > > Did I miss something? > > > > > > > > > > > > > > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel. > > > > > > > > The app can choose its own dynamic linker path during build, (maybe > > > > > > > > even statically link one ?) This is another reason that relying on a > > > > > > > > userspace only is not enough. > > > > > > > > > > > > > > The kernel calls open_exec() on all dependencies, including > > > > > > > ld-linux-x86-64.so.2, so these files are checked for executability too. > > > > > > > > > > > > > This might not be entirely true. iiuc, kernel calls open_exec for > > > > > > open_exec for interpreter, but not all its dependency (e.g. libc.so.6) > > > > > > > > > > Correct, the dynamic linker is in charge of that, which is why it must > > > > > be enlighten with execveat+AT_CHECK and securebits checks. > > > > > > > > > > > load_elf_binary() { > > > > > > interpreter = open_exec(elf_interpreter); > > > > > > } > > > > > > > > > > > > libc.so.6 is opened and mapped by dynamic linker. > > > > > > so the call sequence is: > > > > > > execve(a.out) > > > > > > - open exec(a.out) > > > > > > - security_bprm_creds(a.out) > > > > > > - open the exec(ld.so) > > > > > > - call open_exec() for interruptor (ld.so) > > > > > > - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through > > > > > > the same check and code path as libc.so below ? > > > > > > > > > > open_exec() checks are enough. LSMs can use this information (open + > > > > > __FMODE_EXEC) if needed. execveat+AT_CHECK is only a user space > > > > > request. > > > > > > > > > Then the ld.so doesn't go through the same security_bprm_creds() check > > > > as other .so. > > > > > > Indeed, but... > > > > > My point is: we will want all the .so going through the same code > > path, so security_ functions are called consistently across all the > > objects, And in the future, if we want to develop additional LSM > > functionality based on AT_CHECK, it will be applied to all objects. > > I'll extend the doc to encourage LSMs to check for __FMODE_EXEC, which > already is the common security check for all executable dependencies. > As extra information, they can get explicit requests by looking at > execveat+AT_CHECK call. > I agree that security_file_open + __FMODE_EXEC for checking all the .so (e.g for executable memfd) is a better option than checking at security_bprm_creds_for_exec. But then maybe execveat( AT_CHECK) can return after calling alloc_bprm ? See below call graph: do_execveat_common (AT_CHECK) -> alloc_bprm ->->do_open_execat ->->-> do_filp_open (__FMODE_EXEC) ->->->->->->> security_file_open -> bprm_execve ->-> prepare_exec_creds ->->-> prepare_creds ->->->-> security_prepare_creds ->-> security_bprm_creds_for_exec What is the consideration to mark the end at security_bprm_creds_for_exec ? i.e. including brpm_execve, prepare_creds, security_prepare_creds, security_bprm_creds_for_exec. Since dynamic linker doesn't load ld.so (it is by kernel), ld.so won't go through those security_prepare_creds and security_bprm_creds_for_exec checks like other .so do. > > > > Another thing to consider is: we are asking userspace to make > > additional syscall before loading the file into memory/get executed, > > there is a possibility for future expansion of the mechanism, without > > asking user space to add another syscall again. > > AT_CHECK is defined with a specific semantic. Other mechanisms (e.g. > LSM policies) could enforce other restrictions following the same > semantic. We need to keep in mind backward compatibility. > > > > > I m still not convinced yet that execveat(AT_CHECK) fits more than > > faccessat(AT_CHECK) > > faccessat2(2) is dedicated to file permission/attribute check. > execveat(2) is dedicated to execution, which is a superset of file > permission for executability, plus other checks (e.g. noexec). > That sounds reasonable, but if execveat(AT_CHECK) changes behavior of execveat(), someone might argue that faccessat2(EXEC_CHECK) can be made for the executability. I think the decision might depend on what this PATCH intended to check, i.e. where we draw the line. do_open_execat() seems to cover lots of checks for executability, if we are ok with the thing that do_open_execat() checks, then faccessat(AT_CHECK) calling do_open_execat() is an option, it won't have those "unrelated" calls in execve path, e.g. bprm_stack_limits, copy argc/env . However, you mentioned superset of file permission for executability, can you elaborate on that ? Is there something not included in do_open_execat() but still necessary for execveat(AT_CHECK)? maybe security_bprm_creds_for_exec? (this goes back to my question above) Thanks Best regards, -Jeff > > > > > > > > > > > > As my previous email, the ChromeOS LSM restricts executable mfd > > > > through security_bprm_creds(), the end result is that ld.so can still > > > > be executable memfd, but not other .so. > > > > > > The chromeOS LSM can check that with the security_file_open() hook and > > > the __FMODE_EXEC flag, see Landlock's implementation. I think this > > > should be the only hook implementation that chromeOS LSM needs to add. > > > > > > > > > > > One way to address this is to refactor the necessary code from > > > > execveat() code patch, and make it available to call from both kernel > > > > and execveat() code paths., but if we do that, we might as well use > > > > faccessat2(AT_CHECK) > > > > > > That's why I think it makes sense to rely on the existing __FMODE_EXEC > > > information. > > > > > > > > > > > > > > > > > - transfer the control to ld.so) > > > > > > - ld.so open (libc.so) > > > > > > - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch, > > > > > > require dynamic linker change. > > > > > > - ld.so mmap(libc.so,rx) > > > > > > > > > > Explaining these steps is useful. I'll include that in the next patch > > > > > series. > >
On Mon, Aug 05, 2024 at 11:35:09AM -0700, Jeff Xu wrote: > On Tue, Jul 23, 2024 at 6:15 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > On Fri, Jul 19, 2024 at 08:27:18AM -0700, Jeff Xu wrote: > > > On Fri, Jul 19, 2024 at 8:04 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > On Fri, Jul 19, 2024 at 07:16:55AM -0700, Jeff Xu wrote: > > > > > On Fri, Jul 19, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote: > > > > > > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > > > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote: > > > > > > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > > > > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote: > > > > > > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > > > > > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > > > > > > > > > > > allowed for execution. The main use case is for script interpreters and > > > > > > > > > > > > dynamic linkers to check execution permission according to the kernel's > > > > > > > > > > > > security policy. Another use case is to add context to access logs e.g., > > > > > > > > > > > > which script (instead of interpreter) accessed a file. As any > > > > > > > > > > > > executable code, scripts could also use this check [1]. > > > > > > > > > > > > > > > > > > > > > > > > This is different than faccessat(2) which only checks file access > > > > > > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit, > > > > > > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials). > > > > > > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a > > > > > > > > > > > > real execution, user space gets the same error codes. > > > > > > > > > > > > > > > > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the > > > > > > > > > > > exec, shared object, script and config file (such as seccomp config), > > > > > > > > > > > > > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make > > > > > > > > > > > sure it passes AT_CHECK, before loading it into memory. > > > > > > > > > > > > > > > > > > > > All ELF dependencies are opened and checked with open_exec(), which > > > > > > > > > > perform the main executability checks (with the __FMODE_EXEC flag). > > > > > > > > > > Did I miss something? > > > > > > > > > > > > > > > > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel. > > > > > > > > > The app can choose its own dynamic linker path during build, (maybe > > > > > > > > > even statically link one ?) This is another reason that relying on a > > > > > > > > > userspace only is not enough. > > > > > > > > > > > > > > > > The kernel calls open_exec() on all dependencies, including > > > > > > > > ld-linux-x86-64.so.2, so these files are checked for executability too. > > > > > > > > > > > > > > > This might not be entirely true. iiuc, kernel calls open_exec for > > > > > > > open_exec for interpreter, but not all its dependency (e.g. libc.so.6) > > > > > > > > > > > > Correct, the dynamic linker is in charge of that, which is why it must > > > > > > be enlighten with execveat+AT_CHECK and securebits checks. > > > > > > > > > > > > > load_elf_binary() { > > > > > > > interpreter = open_exec(elf_interpreter); > > > > > > > } > > > > > > > > > > > > > > libc.so.6 is opened and mapped by dynamic linker. > > > > > > > so the call sequence is: > > > > > > > execve(a.out) > > > > > > > - open exec(a.out) > > > > > > > - security_bprm_creds(a.out) > > > > > > > - open the exec(ld.so) > > > > > > > - call open_exec() for interruptor (ld.so) > > > > > > > - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through > > > > > > > the same check and code path as libc.so below ? > > > > > > > > > > > > open_exec() checks are enough. LSMs can use this information (open + > > > > > > __FMODE_EXEC) if needed. execveat+AT_CHECK is only a user space > > > > > > request. > > > > > > > > > > > Then the ld.so doesn't go through the same security_bprm_creds() check > > > > > as other .so. > > > > > > > > Indeed, but... > > > > > > > My point is: we will want all the .so going through the same code > > > path, so security_ functions are called consistently across all the > > > objects, And in the future, if we want to develop additional LSM > > > functionality based on AT_CHECK, it will be applied to all objects. > > > > I'll extend the doc to encourage LSMs to check for __FMODE_EXEC, which > > already is the common security check for all executable dependencies. > > As extra information, they can get explicit requests by looking at > > execveat+AT_CHECK call. > > > I agree that security_file_open + __FMODE_EXEC for checking all > the .so (e.g for executable memfd) is a better option than checking at > security_bprm_creds_for_exec. > > But then maybe execveat( AT_CHECK) can return after calling alloc_bprm ? > See below call graph: > > do_execveat_common (AT_CHECK) > -> alloc_bprm > ->->do_open_execat > ->->-> do_filp_open (__FMODE_EXEC) > ->->->->->->> security_file_open > -> bprm_execve > ->-> prepare_exec_creds > ->->-> prepare_creds > ->->->-> security_prepare_creds > ->-> security_bprm_creds_for_exec > > What is the consideration to mark the end at > security_bprm_creds_for_exec ? i.e. including brpm_execve, > prepare_creds, security_prepare_creds, security_bprm_creds_for_exec. This enables LSMs to know/log an explicit execution request, including context with argv and envp. > > Since dynamic linker doesn't load ld.so (it is by kernel), ld.so > won't go through those security_prepare_creds and > security_bprm_creds_for_exec checks like other .so do. Yes, but this is not an issue nor an explicit request. ld.so is only one case of this patch series. > > > > > > > Another thing to consider is: we are asking userspace to make > > > additional syscall before loading the file into memory/get executed, > > > there is a possibility for future expansion of the mechanism, without > > > asking user space to add another syscall again. > > > > AT_CHECK is defined with a specific semantic. Other mechanisms (e.g. > > LSM policies) could enforce other restrictions following the same > > semantic. We need to keep in mind backward compatibility. > > > > > > > > I m still not convinced yet that execveat(AT_CHECK) fits more than > > > faccessat(AT_CHECK) > > > > faccessat2(2) is dedicated to file permission/attribute check. > > execveat(2) is dedicated to execution, which is a superset of file > > permission for executability, plus other checks (e.g. noexec). > > > That sounds reasonable, but if execveat(AT_CHECK) changes behavior of > execveat(), someone might argue that faccessat2(EXEC_CHECK) can be > made for the executability. AT_CHECK, as any other syscall flags, changes the behavior of execveat, but the overall semantic is clearly defined. Again, faccessat2 is only dedicated to file attributes/permissions, not file executability. > > I think the decision might depend on what this PATCH intended to > check, i.e. where we draw the line. The goal is clearly defined in the cover letter and patches: makes it possible to control (or log) script execution. > > do_open_execat() seems to cover lots of checks for executability, if > we are ok with the thing that do_open_execat() checks, then > faccessat(AT_CHECK) calling do_open_execat() is an option, it won't > have those "unrelated" calls in execve path, e.g. bprm_stack_limits, > copy argc/env . I don't thing there is any unrelated calls in execve path, quite the contrary, it follows the same semantic as for a full execution, and that's another argument to use the execveat interface. Otherwise, we couldn't argue that `./script.sh` can be the same as `sh script.sh` The only difference is that user space is in charge of parsing and interpreting the file's content. > > However, you mentioned superset of file permission for executability, > can you elaborate on that ? Is there something not included in > do_open_execat() but still necessary for execveat(AT_CHECK)? maybe > security_bprm_creds_for_exec? (this goes back to my question above) As explained above, the goal is to have the same semantic as a full execveat call, taking into account all the checks (e.g. stack limit, argv/envp...). > > Thanks > Best regards, > -Jeff > > > > > > > > > > > > > > > > > > > > > > > > > > > As my previous email, the ChromeOS LSM restricts executable mfd > > > > > through security_bprm_creds(), the end result is that ld.so can still > > > > > be executable memfd, but not other .so. > > > > > > > > The chromeOS LSM can check that with the security_file_open() hook and > > > > the __FMODE_EXEC flag, see Landlock's implementation. I think this > > > > should be the only hook implementation that chromeOS LSM needs to add. > > > > > > > > > > > > > > One way to address this is to refactor the necessary code from > > > > > execveat() code patch, and make it available to call from both kernel > > > > > and execveat() code paths., but if we do that, we might as well use > > > > > faccessat2(AT_CHECK) > > > > > > > > That's why I think it makes sense to rely on the existing __FMODE_EXEC > > > > information. > > > > > > > > > > > > > > > > > > > > > - transfer the control to ld.so) > > > > > > > - ld.so open (libc.so) > > > > > > > - ld.so call execveat(AT_CHECK,libc.so) <-- proposed by this patch, > > > > > > > require dynamic linker change. > > > > > > > - ld.so mmap(libc.so,rx) > > > > > > > > > > > > Explaining these steps is useful. I'll include that in the next patch > > > > > > series. > > > >
On Fri, Aug 9, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote: > > On Mon, Aug 05, 2024 at 11:35:09AM -0700, Jeff Xu wrote: > > On Tue, Jul 23, 2024 at 6:15 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > On Fri, Jul 19, 2024 at 08:27:18AM -0700, Jeff Xu wrote: > > > > On Fri, Jul 19, 2024 at 8:04 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > On Fri, Jul 19, 2024 at 07:16:55AM -0700, Jeff Xu wrote: > > > > > > On Fri, Jul 19, 2024 at 1:45 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > > > On Thu, Jul 18, 2024 at 06:29:54PM -0700, Jeff Xu wrote: > > > > > > > > On Thu, Jul 18, 2024 at 5:24 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > > > > > > > On Wed, Jul 17, 2024 at 07:08:17PM -0700, Jeff Xu wrote: > > > > > > > > > > On Wed, Jul 17, 2024 at 3:01 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > > > > > > > > > > > On Tue, Jul 16, 2024 at 11:33:55PM -0700, Jeff Xu wrote: > > > > > > > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > Add a new AT_CHECK flag to execveat(2) to check if a file would be > > > > > > > > > > > > > allowed for execution. The main use case is for script interpreters and > > > > > > > > > > > > > dynamic linkers to check execution permission according to the kernel's > > > > > > > > > > > > > security policy. Another use case is to add context to access logs e.g., > > > > > > > > > > > > > which script (instead of interpreter) accessed a file. As any > > > > > > > > > > > > > executable code, scripts could also use this check [1]. > > > > > > > > > > > > > > > > > > > > > > > > > > This is different than faccessat(2) which only checks file access > > > > > > > > > > > > > rights, but not the full context e.g. mount point's noexec, stack limit, > > > > > > > > > > > > > and all potential LSM extra checks (e.g. argv, envp, credentials). > > > > > > > > > > > > > Since the use of AT_CHECK follows the exact kernel semantic as for a > > > > > > > > > > > > > real execution, user space gets the same error codes. > > > > > > > > > > > > > > > > > > > > > > > > > So we concluded that execveat(AT_CHECK) will be used to check the > > > > > > > > > > > > exec, shared object, script and config file (such as seccomp config), > > > > > > > > > > > > > > > > > > > I think binfmt_elf.c in the kernel needs to check the ld.so to make > > > > > > > > > > > > sure it passes AT_CHECK, before loading it into memory. > > > > > > > > > > > > > > > > > > > > > > All ELF dependencies are opened and checked with open_exec(), which > > > > > > > > > > > perform the main executability checks (with the __FMODE_EXEC flag). > > > > > > > > > > > Did I miss something? > > > > > > > > > > > > > > > > > > > > > I mean the ld-linux-x86-64.so.2 which is loaded by binfmt in the kernel. > > > > > > > > > > The app can choose its own dynamic linker path during build, (maybe > > > > > > > > > > even statically link one ?) This is another reason that relying on a > > > > > > > > > > userspace only is not enough. > > > > > > > > > > > > > > > > > > The kernel calls open_exec() on all dependencies, including > > > > > > > > > ld-linux-x86-64.so.2, so these files are checked for executability too. > > > > > > > > > > > > > > > > > This might not be entirely true. iiuc, kernel calls open_exec for > > > > > > > > open_exec for interpreter, but not all its dependency (e.g. libc.so.6) > > > > > > > > > > > > > > Correct, the dynamic linker is in charge of that, which is why it must > > > > > > > be enlighten with execveat+AT_CHECK and securebits checks. > > > > > > > > > > > > > > > load_elf_binary() { > > > > > > > > interpreter = open_exec(elf_interpreter); > > > > > > > > } > > > > > > > > > > > > > > > > libc.so.6 is opened and mapped by dynamic linker. > > > > > > > > so the call sequence is: > > > > > > > > execve(a.out) > > > > > > > > - open exec(a.out) > > > > > > > > - security_bprm_creds(a.out) > > > > > > > > - open the exec(ld.so) > > > > > > > > - call open_exec() for interruptor (ld.so) > > > > > > > > - call execveat(AT_CHECK, ld.so) <-- do we want ld.so going through > > > > > > > > the same check and code path as libc.so below ? > > > > > > > > > > > > > > open_exec() checks are enough. LSMs can use this information (open + > > > > > > > __FMODE_EXEC) if needed. execveat+AT_CHECK is only a user space > > > > > > > request. > > > > > > > > > > > > > Then the ld.so doesn't go through the same security_bprm_creds() check > > > > > > as other .so. > > > > > > > > > > Indeed, but... > > > > > > > > > My point is: we will want all the .so going through the same code > > > > path, so security_ functions are called consistently across all the > > > > objects, And in the future, if we want to develop additional LSM > > > > functionality based on AT_CHECK, it will be applied to all objects. > > > > > > I'll extend the doc to encourage LSMs to check for __FMODE_EXEC, which > > > already is the common security check for all executable dependencies. > > > As extra information, they can get explicit requests by looking at > > > execveat+AT_CHECK call. > > > > > I agree that security_file_open + __FMODE_EXEC for checking all > > the .so (e.g for executable memfd) is a better option than checking at > > security_bprm_creds_for_exec. > > > > But then maybe execveat( AT_CHECK) can return after calling alloc_bprm ? > > See below call graph: > > > > do_execveat_common (AT_CHECK) > > -> alloc_bprm > > ->->do_open_execat > > ->->-> do_filp_open (__FMODE_EXEC) > > ->->->->->->> security_file_open > > -> bprm_execve > > ->-> prepare_exec_creds > > ->->-> prepare_creds > > ->->->-> security_prepare_creds > > ->-> security_bprm_creds_for_exec > > > > What is the consideration to mark the end at > > security_bprm_creds_for_exec ? i.e. including brpm_execve, > > prepare_creds, security_prepare_creds, security_bprm_creds_for_exec. > > This enables LSMs to know/log an explicit execution request, including > context with argv and envp. > > > > > Since dynamic linker doesn't load ld.so (it is by kernel), ld.so > > won't go through those security_prepare_creds and > > security_bprm_creds_for_exec checks like other .so do. > > Yes, but this is not an issue nor an explicit request. ld.so is only one > case of this patch series. > > > > > > > > > > > Another thing to consider is: we are asking userspace to make > > > > additional syscall before loading the file into memory/get executed, > > > > there is a possibility for future expansion of the mechanism, without > > > > asking user space to add another syscall again. > > > > > > AT_CHECK is defined with a specific semantic. Other mechanisms (e.g. > > > LSM policies) could enforce other restrictions following the same > > > semantic. We need to keep in mind backward compatibility. > > > > > > > > > > > I m still not convinced yet that execveat(AT_CHECK) fits more than > > > > faccessat(AT_CHECK) > > > > > > faccessat2(2) is dedicated to file permission/attribute check. > > > execveat(2) is dedicated to execution, which is a superset of file > > > permission for executability, plus other checks (e.g. noexec). > > > > > That sounds reasonable, but if execveat(AT_CHECK) changes behavior of > > execveat(), someone might argue that faccessat2(EXEC_CHECK) can be > > made for the executability. > > AT_CHECK, as any other syscall flags, changes the behavior of execveat, > but the overall semantic is clearly defined. > > Again, faccessat2 is only dedicated to file attributes/permissions, not > file executability. > > > > > I think the decision might depend on what this PATCH intended to > > check, i.e. where we draw the line. > > The goal is clearly defined in the cover letter and patches: makes it > possible to control (or log) script execution. > > > > > do_open_execat() seems to cover lots of checks for executability, if > > we are ok with the thing that do_open_execat() checks, then > > faccessat(AT_CHECK) calling do_open_execat() is an option, it won't > > have those "unrelated" calls in execve path, e.g. bprm_stack_limits, > > copy argc/env . > > I don't thing there is any unrelated calls in execve path, quite the > contrary, it follows the same semantic as for a full execution, and > that's another argument to use the execveat interface. Otherwise, we > couldn't argue that `./script.sh` can be the same as `sh script.sh` > It is a good point from the "scrip.sh/exec" perspective that we want it to go through the same execve path. The reasoning is not obvious from the ".so" which doesn't go through stack/env check. Since execveat(AT_CHECK) wants to cover both cases, it is fine. > The only difference is that user space is in charge of parsing and > interpreting the file's content. > > > > > However, you mentioned superset of file permission for executability, > > can you elaborate on that ? Is there something not included in > > do_open_execat() but still necessary for execveat(AT_CHECK)? maybe > > security_bprm_creds_for_exec? (this goes back to my question above) > > As explained above, the goal is to have the same semantic as a full > execveat call, taking into account all the checks (e.g. stack limit, > argv/envp...). > I'm fine with this, thanks for taking time to explain the design. Regarding the future LSM based on this patch series: For .so, security_file_open is recommended for LSM. For scripts/exec (that needs a full exec code path), security_file_open and security_bprm_creds_for_exec can both be used. Thanks Best regards, -Jeff
diff --git a/fs/exec.c b/fs/exec.c index 40073142288f..ea2a1867afdc 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -931,7 +931,7 @@ static struct file *do_open_execat(int fd, struct filename *name, int flags) .lookup_flags = LOOKUP_FOLLOW, }; - if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH)) != 0) + if ((flags & ~(AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH | AT_CHECK)) != 0) return ERR_PTR(-EINVAL); if (flags & AT_SYMLINK_NOFOLLOW) open_exec_flags.lookup_flags &= ~LOOKUP_FOLLOW; @@ -1595,6 +1595,7 @@ static struct linux_binprm *alloc_bprm(int fd, struct filename *filename, int fl bprm->filename = bprm->fdpath; } bprm->interp = bprm->filename; + bprm->is_check = !!(flags & AT_CHECK); retval = bprm_mm_init(bprm); if (!retval) @@ -1885,7 +1886,7 @@ static int bprm_execve(struct linux_binprm *bprm) /* Set the unchanging part of bprm->cred */ retval = security_bprm_creds_for_exec(bprm); - if (retval) + if (retval || bprm->is_check) goto out; retval = exec_binprm(bprm); diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 70f97f685bff..8ff9c9e33aed 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -42,7 +42,12 @@ struct linux_binprm { * Set when errors can no longer be returned to the * original userspace. */ - point_of_no_return:1; + point_of_no_return:1, + /* + * Set by user space to check executability according to the + * caller's environment. + */ + is_check:1; struct file *executable; /* Executable to pass to the interpreter */ struct file *interpreter; struct file *file; diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index c0bcc185fa48..bcd05c59b7df 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -118,6 +118,36 @@ #define AT_HANDLE_FID AT_REMOVEDIR /* file handle is needed to compare object identity and may not be usable to open_by_handle_at(2) */ + +/* + * AT_CHECK only performs a check on a regular file and returns 0 if execution + * of this file would be allowed, ignoring the file format and then the related + * interpreter dependencies (e.g. ELF libraries, script's shebang). AT_CHECK + * should only be used if SECBIT_SHOULD_EXEC_CHECK is set for the calling + * thread. See securebits.h documentation. + * + * Programs should use this check to apply kernel-level checks against files + * that are not directly executed by the kernel but directly passed to a user + * space interpreter instead. All files that contain executable code, from the + * point of view of the interpreter, should be checked. The main purpose of + * this flag is to improve the security and consistency of an execution + * environment to ensure that direct file execution (e.g. ./script.sh) and + * indirect file execution (e.g. sh script.sh) lead to the same result. For + * instance, this can be used to check if a file is trustworthy according to + * the caller's environment. + * + * In a secure environment, libraries and any executable dependencies should + * also be checked. For instance dynamic linking should make sure that all + * libraries are allowed for execution to avoid trivial bypass (e.g. using + * LD_PRELOAD). For such secure execution environment to make sense, only + * trusted code should be executable, which also requires integrity guarantees. + * + * To avoid race conditions leading to time-of-check to time-of-use issues, + * AT_CHECK should be used with AT_EMPTY_PATH to check against a file + * descriptor instead of a path. + */ +#define AT_CHECK 0x10000 + #if defined(__KERNEL__) #define AT_GETATTR_NOSEC 0x80000000 #endif diff --git a/kernel/audit.h b/kernel/audit.h index a60d2840559e..8ebdabd2ab81 100644 --- a/kernel/audit.h +++ b/kernel/audit.h @@ -197,6 +197,7 @@ struct audit_context { struct open_how openat2; struct { int argc; + bool is_check; } execve; struct { char *name; diff --git a/kernel/auditsc.c b/kernel/auditsc.c index 6f0d6fb6523f..b6316e284342 100644 --- a/kernel/auditsc.c +++ b/kernel/auditsc.c @@ -2662,6 +2662,7 @@ void __audit_bprm(struct linux_binprm *bprm) context->type = AUDIT_EXECVE; context->execve.argc = bprm->argc; + context->execve.is_check = bprm->is_check; }
Add a new AT_CHECK flag to execveat(2) to check if a file would be allowed for execution. The main use case is for script interpreters and dynamic linkers to check execution permission according to the kernel's security policy. Another use case is to add context to access logs e.g., which script (instead of interpreter) accessed a file. As any executable code, scripts could also use this check [1]. This is different than faccessat(2) which only checks file access rights, but not the full context e.g. mount point's noexec, stack limit, and all potential LSM extra checks (e.g. argv, envp, credentials). Since the use of AT_CHECK follows the exact kernel semantic as for a real execution, user space gets the same error codes. With the information that a script interpreter is about to interpret a script, an LSM security policy can adjust caller's access rights or log execution request as for native script execution (e.g. role transition). This is possible thanks to the call to security_bprm_creds_for_exec(). Because LSMs may only change bprm's credentials, use of AT_CHECK with current kernel code should not be a security issue (e.g. unexpected role transition). LSMs willing to update the caller's credential could now do so when bprm->is_check is set. Of course, such policy change should be in line with the new user space code. Because AT_CHECK is dedicated to user space interpreters, it doesn't make sense for the kernel to parse the checked files, look for interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC if the format is unknown. Because of that, security_bprm_check() is never called when AT_CHECK is used. It should be noted that script interpreters cannot directly use execveat(2) (without this new AT_CHECK flag) because this could lead to unexpected behaviors e.g., `python script.sh` could lead to Bash being executed to interpret the script. Unlike the kernel, script interpreters may just interpret the shebang as a simple comment, which should not change for backward compatibility reasons. Because scripts or libraries files might not currently have the executable permission set, or because we might want specific users to be allowed to run arbitrary scripts, the following patch provides a dynamic configuration mechanism with the SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT securebits. This is a redesign of the CLIP OS 4's O_MAYEXEC: https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch This patch has been used for more than a decade with customized script interpreters. Some examples can be found here: https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Paul Moore <paul@paul-moore.com> Link: https://docs.python.org/3/library/io.html#io.open_code [1] Signed-off-by: Mickaël Salaün <mic@digikod.net> Link: https://lore.kernel.org/r/20240704190137.696169-2-mic@digikod.net --- New design since v18: https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net --- fs/exec.c | 5 +++-- include/linux/binfmts.h | 7 ++++++- include/uapi/linux/fcntl.h | 30 ++++++++++++++++++++++++++++++ kernel/audit.h | 1 + kernel/auditsc.c | 1 + 5 files changed, 41 insertions(+), 3 deletions(-)