Message ID | 20240704190137.696169-3-mic@digikod.net (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Script execution control (was O_MAYEXEC) | expand |
On Thu, Jul 04, 2024 at 09:01:34PM +0200, Mickaël Salaün wrote: > Such a secure environment can be achieved with an appropriate access > control policy (e.g. mount's noexec option, file access rights, LSM > configuration) and an enlighten ld.so checking that libraries are > allowed for execution e.g., to protect against illegitimate use of > LD_PRELOAD. > > Scripts may need some changes to deal with untrusted data (e.g. stdin, > environment variables), but that is outside the scope of the kernel. If the threat model includes an attacker sitting at a shell prompt, we need to be very careful about how process perform enforcement. E.g. even on a locked down system, if an attacker has access to LD_PRELOAD or a seccomp wrapper (which you both mention here), it would be possible to run commands where the resulting process is tricked into thinking it doesn't have the bits set. But this would be exactly true for calling execveat(): LD_PRELOAD or seccomp policy could have it just return 0. While I like AT_CHECK, I do wonder if it's better to do the checks via open(), as was originally designed with O_MAYEXEC. Because then enforcement is gated by the kernel -- the process does not get a file descriptor _at all_, no matter what LD_PRELOAD or seccomp tricks it into doing. And this thinking also applies to faccessat() too: if a process can be tricked into thinking the access check passed, it'll happily interpret whatever. :( But not being able to open the fd _at all_ when O_MAYEXEC is being checked seems substantially safer to me...
On Thu, Jul 04, 2024 at 05:18:04PM -0700, Kees Cook wrote: > On Thu, Jul 04, 2024 at 09:01:34PM +0200, Mickaël Salaün wrote: > > Such a secure environment can be achieved with an appropriate access > > control policy (e.g. mount's noexec option, file access rights, LSM > > configuration) and an enlighten ld.so checking that libraries are > > allowed for execution e.g., to protect against illegitimate use of > > LD_PRELOAD. > > > > Scripts may need some changes to deal with untrusted data (e.g. stdin, > > environment variables), but that is outside the scope of the kernel. > > If the threat model includes an attacker sitting at a shell prompt, we > need to be very careful about how process perform enforcement. E.g. even > on a locked down system, if an attacker has access to LD_PRELOAD or a LD_PRELOAD should be OK once ld.so will be patched to check the libraries. We can still imagine a debug library used to bypass security checks, but in this case the issue would be that this library is executable in the first place. > seccomp wrapper (which you both mention here), it would be possible to > run commands where the resulting process is tricked into thinking it > doesn't have the bits set. As explained in the UAPI comments, all parent processes need to be trusted. This meeans that their code is trusted, their seccomp filters are trusted, and that they are patched, if needed, to check file executability. > > But this would be exactly true for calling execveat(): LD_PRELOAD or > seccomp policy could have it just return 0. If an attacker is allowed/able to load an arbitrary seccomp filter on a process, we cannot trust this process. > > While I like AT_CHECK, I do wonder if it's better to do the checks via > open(), as was originally designed with O_MAYEXEC. Because then > enforcement is gated by the kernel -- the process does not get a file > descriptor _at all_, no matter what LD_PRELOAD or seccomp tricks it into > doing. Being able to check a path name or a file descriptor (with the same syscall) is more flexible and cover more use cases. The execveat(2) interface, including current and future flags, is dedicated to file execution. I then think that using execveat(2) for this kind of check makes more sense, and will easily evolve with this syscall. > > And this thinking also applies to faccessat() too: if a process can be > tricked into thinking the access check passed, it'll happily interpret > whatever. :( But not being able to open the fd _at all_ when O_MAYEXEC > is being checked seems substantially safer to me... If attackers can filter execveat(2), they can also filter open(2) and any other syscalls. In all cases, that would mean an issue in the security policy.
On Fri, Jul 05, 2024 at 07:54:16PM +0200, Mickaël Salaün wrote: > On Thu, Jul 04, 2024 at 05:18:04PM -0700, Kees Cook wrote: > > On Thu, Jul 04, 2024 at 09:01:34PM +0200, Mickaël Salaün wrote: > > > Such a secure environment can be achieved with an appropriate access > > > control policy (e.g. mount's noexec option, file access rights, LSM > > > configuration) and an enlighten ld.so checking that libraries are > > > allowed for execution e.g., to protect against illegitimate use of > > > LD_PRELOAD. > > > > > > Scripts may need some changes to deal with untrusted data (e.g. stdin, > > > environment variables), but that is outside the scope of the kernel. > > > > If the threat model includes an attacker sitting at a shell prompt, we > > need to be very careful about how process perform enforcement. E.g. even > > on a locked down system, if an attacker has access to LD_PRELOAD or a > > LD_PRELOAD should be OK once ld.so will be patched to check the > libraries. We can still imagine a debug library used to bypass security > checks, but in this case the issue would be that this library is > executable in the first place. Ah yes, that's fair: the shell would discover the malicious library while using AT_CHECK during resolution of the LD_PRELOAD. > > seccomp wrapper (which you both mention here), it would be possible to > > run commands where the resulting process is tricked into thinking it > > doesn't have the bits set. > > As explained in the UAPI comments, all parent processes need to be > trusted. This meeans that their code is trusted, their seccomp filters > are trusted, and that they are patched, if needed, to check file > executability. But we have launchers that apply arbitrary seccomp policy, e.g. minijail on Chrome OS, or even systemd on regular distros. In theory, this should be handled via other ACLs. > > But this would be exactly true for calling execveat(): LD_PRELOAD or > > seccomp policy could have it just return 0. > > If an attacker is allowed/able to load an arbitrary seccomp filter on a > process, we cannot trust this process. > > > > > While I like AT_CHECK, I do wonder if it's better to do the checks via > > open(), as was originally designed with O_MAYEXEC. Because then > > enforcement is gated by the kernel -- the process does not get a file > > descriptor _at all_, no matter what LD_PRELOAD or seccomp tricks it into > > doing. > > Being able to check a path name or a file descriptor (with the same > syscall) is more flexible and cover more use cases. If flexibility costs us reliability, I think that flexibility is not a benefit. > The execveat(2) > interface, including current and future flags, is dedicated to file > execution. I then think that using execveat(2) for this kind of check > makes more sense, and will easily evolve with this syscall. Yeah, I do recognize that is feels much more natural, but I remain unhappy about how difficult it will become to audit a system for safety when the check is strictly per-process opt-in, and not enforced by the kernel for a given process tree. But, I think this may have always been a fiction in my mind. :) > > And this thinking also applies to faccessat() too: if a process can be > > tricked into thinking the access check passed, it'll happily interpret > > whatever. :( But not being able to open the fd _at all_ when O_MAYEXEC > > is being checked seems substantially safer to me... > > If attackers can filter execveat(2), they can also filter open(2) and > any other syscalls. In all cases, that would mean an issue in the > security policy. Hm, as in, make a separate call to open(2) without O_MAYEXEC, and pass that fd back to the filtered open(2) that did have O_MAYEXEC. Yes, true. I guess it does become morally equivalent. Okay. Well, let me ask about usability. Right now, a process will need to do: - should I use AT_CHECK? (check secbit) - if yes: perform execveat(AT_CHECK) Why not leave the secbit test up to the kernel, and then the program can just unconditionally call execveat(AT_CHECK)? Though perhaps the issue here is that an execveat() EINVAL doesn't tell the program if AT_CHECK is unimplemented or if something else went wrong, and the secbit prctl() will give the correct signal about AT_CHECK availability?
On Sat Jul 6, 2024 at 12:44 AM EEST, Kees Cook wrote: > > As explained in the UAPI comments, all parent processes need to be > > trusted. This meeans that their code is trusted, their seccomp filters > > are trusted, and that they are patched, if needed, to check file > > executability. > > But we have launchers that apply arbitrary seccomp policy, e.g. minijail > on Chrome OS, or even systemd on regular distros. In theory, this should > be handled via other ACLs. Or a regular web browser? AFAIK seccomp filtering was the tool to make secure browser tabs in the first place. BR, Jarkko
On Fri, Jul 05, 2024 at 02:44:03PM -0700, Kees Cook wrote: > On Fri, Jul 05, 2024 at 07:54:16PM +0200, Mickaël Salaün wrote: > > On Thu, Jul 04, 2024 at 05:18:04PM -0700, Kees Cook wrote: > > > On Thu, Jul 04, 2024 at 09:01:34PM +0200, Mickaël Salaün wrote: > > > > Such a secure environment can be achieved with an appropriate access > > > > control policy (e.g. mount's noexec option, file access rights, LSM > > > > configuration) and an enlighten ld.so checking that libraries are > > > > allowed for execution e.g., to protect against illegitimate use of > > > > LD_PRELOAD. > > > > > > > > Scripts may need some changes to deal with untrusted data (e.g. stdin, > > > > environment variables), but that is outside the scope of the kernel. > > > > > > If the threat model includes an attacker sitting at a shell prompt, we > > > need to be very careful about how process perform enforcement. E.g. even > > > on a locked down system, if an attacker has access to LD_PRELOAD or a > > > > LD_PRELOAD should be OK once ld.so will be patched to check the > > libraries. We can still imagine a debug library used to bypass security > > checks, but in this case the issue would be that this library is > > executable in the first place. > > Ah yes, that's fair: the shell would discover the malicious library > while using AT_CHECK during resolution of the LD_PRELOAD. That's the idea, but it would be checked by ld.so, not the shell. > > > > seccomp wrapper (which you both mention here), it would be possible to > > > run commands where the resulting process is tricked into thinking it > > > doesn't have the bits set. > > > > As explained in the UAPI comments, all parent processes need to be > > trusted. This meeans that their code is trusted, their seccomp filters > > are trusted, and that they are patched, if needed, to check file > > executability. > > But we have launchers that apply arbitrary seccomp policy, e.g. minijail > on Chrome OS, or even systemd on regular distros. In theory, this should > be handled via other ACLs. Processes running with untrusted seccomp filter should be considered untrusted. It would then make sense for these seccomp filters/programs to be considered executable code, and then for minijail and systemd to check them with AT_CHECK (according to the securebits policy). > > > > But this would be exactly true for calling execveat(): LD_PRELOAD or > > > seccomp policy could have it just return 0. > > > > If an attacker is allowed/able to load an arbitrary seccomp filter on a > > process, we cannot trust this process. > > > > > > > > While I like AT_CHECK, I do wonder if it's better to do the checks via > > > open(), as was originally designed with O_MAYEXEC. Because then > > > enforcement is gated by the kernel -- the process does not get a file > > > descriptor _at all_, no matter what LD_PRELOAD or seccomp tricks it into > > > doing. > > > > Being able to check a path name or a file descriptor (with the same > > syscall) is more flexible and cover more use cases. > > If flexibility costs us reliability, I think that flexibility is not > a benefit. Well, it's a matter of letting user space do what they think is best, and I think there are legitimate and safe uses of path names, even if I agree that this should not be used in most use cases. Would we want faccessat2(2) to only take file descriptor as argument and not file path? I don't think so but I'd defer to the VFS maintainers. Christian, Al, Linus? Steve, could you share a use case with file paths? > > > The execveat(2) > > interface, including current and future flags, is dedicated to file > > execution. I then think that using execveat(2) for this kind of check > > makes more sense, and will easily evolve with this syscall. > > Yeah, I do recognize that is feels much more natural, but I remain > unhappy about how difficult it will become to audit a system for safety > when the check is strictly per-process opt-in, and not enforced by the > kernel for a given process tree. But, I think this may have always been > a fiction in my mind. :) Hmm, I'm not sure to follow. Securebits are inherited, so process tree. And we need the parent processes to be trusted anyway. > > > > And this thinking also applies to faccessat() too: if a process can be > > > tricked into thinking the access check passed, it'll happily interpret > > > whatever. :( But not being able to open the fd _at all_ when O_MAYEXEC > > > is being checked seems substantially safer to me... > > > > If attackers can filter execveat(2), they can also filter open(2) and > > any other syscalls. In all cases, that would mean an issue in the > > security policy. > > Hm, as in, make a separate call to open(2) without O_MAYEXEC, and pass > that fd back to the filtered open(2) that did have O_MAYEXEC. Yes, true. > > I guess it does become morally equivalent. > > Okay. Well, let me ask about usability. Right now, a process will need > to do: > > - should I use AT_CHECK? (check secbit) > - if yes: perform execveat(AT_CHECK) > > Why not leave the secbit test up to the kernel, and then the program can > just unconditionally call execveat(AT_CHECK)? That was kind of the approach of the previous patch series and Linus wanted the new interface to follow the kernel semantic. Enforcing this kind of restriction will always be the duty of user space anyway, so I think it's simpler (i.e. no mix of policy definition, access check, and policy enforcement, but a standalone execveat feature), more flexible, and it fully delegates the policy enforcement to user space instead of trying to enforce some part in the kernel which would only give the illusion of security/policy enforcement. > > Though perhaps the issue here is that an execveat() EINVAL doesn't > tell the program if AT_CHECK is unimplemented or if something else > went wrong, and the secbit prctl() will give the correct signal about > AT_CHECK availability? This kind of check could indeed help to identify the issue.
On Sat, Jul 06, 2024 at 01:22:06AM +0300, Jarkko Sakkinen wrote: > On Sat Jul 6, 2024 at 12:44 AM EEST, Kees Cook wrote: > > > As explained in the UAPI comments, all parent processes need to be > > > trusted. This meeans that their code is trusted, their seccomp filters > > > are trusted, and that they are patched, if needed, to check file > > > executability. > > > > But we have launchers that apply arbitrary seccomp policy, e.g. minijail > > on Chrome OS, or even systemd on regular distros. In theory, this should > > be handled via other ACLs. > > Or a regular web browser? AFAIK seccomp filtering was the tool to make > secure browser tabs in the first place. Yes, and that't OK. Web browsers embedded their own seccomp filters and they are then as trusted as the browser code.
On Sat Jul 6, 2024 at 5:56 PM EEST, Mickaël Salaün wrote: > On Sat, Jul 06, 2024 at 01:22:06AM +0300, Jarkko Sakkinen wrote: > > On Sat Jul 6, 2024 at 12:44 AM EEST, Kees Cook wrote: > > > > As explained in the UAPI comments, all parent processes need to be > > > > trusted. This meeans that their code is trusted, their seccomp filters > > > > are trusted, and that they are patched, if needed, to check file > > > > executability. > > > > > > But we have launchers that apply arbitrary seccomp policy, e.g. minijail > > > on Chrome OS, or even systemd on regular distros. In theory, this should > > > be handled via other ACLs. > > > > Or a regular web browser? AFAIK seccomp filtering was the tool to make > > secure browser tabs in the first place. > > Yes, and that't OK. Web browsers embedded their own seccomp filters and > they are then as trusted as the browser code. I'd recommend to slice of tech detail from cover letter, as long as those details are in the commit messages. Then, in the cover letter I'd go through maybe two familiar scenarios, with interactions to this functionality. A desktop web browser could be perhaps one of them... BR, Jarkko
Hi On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > These new SECBIT_SHOULD_EXEC_CHECK, SECBIT_SHOULD_EXEC_RESTRICT, and > their *_LOCKED counterparts are designed to be set by processes setting > up an execution environment, such as a user session, a container, or a > security sandbox. Like seccomp filters or Landlock domains, the > securebits are inherited across proceses. > > When SECBIT_SHOULD_EXEC_CHECK is set, programs interpreting code should > check executable resources with execveat(2) + AT_CHECK (see previous > patch). > > When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow > execution of approved resources, if any (see SECBIT_SHOULD_EXEC_CHECK). > Do we need both bits ? When CHECK is set and RESTRICT is not, the "check fail" executable will still get executed, so CHECK is for logging ? Does RESTRICT imply CHECK is set, e.g. What if CHECK=0 and RESTRICT = 1 ? > For a secure environment, we might also want > SECBIT_SHOULD_EXEC_CHECK_LOCKED and SECBIT_SHOULD_EXEC_RESTRICT_LOCKED > to be set. For a test environment (e.g. testing on a fleet to identify > potential issues), only the SECBIT_SHOULD_EXEC_CHECK* bits can be set to > still be able to identify potential issues (e.g. with interpreters logs > or LSMs audit entries). > > It should be noted that unlike other security bits, the > SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT bits are > dedicated to user space willing to restrict itself. Because of that, > they only make sense in the context of a trusted environment (e.g. > sandbox, container, user session, full system) where the process > changing its behavior (according to these bits) and all its parent > processes are trusted. Otherwise, any parent process could just execute > its own malicious code (interpreting a script or not), or even enforce a > seccomp filter to mask these bits. > > Such a secure environment can be achieved with an appropriate access > control policy (e.g. mount's noexec option, file access rights, LSM > configuration) and an enlighten ld.so checking that libraries are > allowed for execution e.g., to protect against illegitimate use of > LD_PRELOAD. > > Scripts may need some changes to deal with untrusted data (e.g. stdin, > environment variables), but that is outside the scope of the kernel. > > The only restriction enforced by the kernel is the right to ptrace > another process. Processes are denied to ptrace less restricted ones, > unless the tracer has CAP_SYS_PTRACE. This is mainly a safeguard to > avoid trivial privilege escalations e.g., by a debugging process being > abused with a confused deputy attack. > > Cc: Al Viro <viro@zeniv.linux.org.uk> > Cc: Christian Brauner <brauner@kernel.org> > Cc: Kees Cook <keescook@chromium.org> > Cc: Paul Moore <paul@paul-moore.com> > Signed-off-by: Mickaël Salaün <mic@digikod.net> > Link: https://lore.kernel.org/r/20240704190137.696169-3-mic@digikod.net > --- > > New design since v18: > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net > --- > include/uapi/linux/securebits.h | 56 ++++++++++++++++++++++++++++- > security/commoncap.c | 63 ++++++++++++++++++++++++++++----- > 2 files changed, 110 insertions(+), 9 deletions(-) > > diff --git a/include/uapi/linux/securebits.h b/include/uapi/linux/securebits.h > index d6d98877ff1a..3fdb0382718b 100644 > --- a/include/uapi/linux/securebits.h > +++ b/include/uapi/linux/securebits.h > @@ -52,10 +52,64 @@ > #define SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED \ > (issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE_LOCKED)) > > +/* > + * When SECBIT_SHOULD_EXEC_CHECK is set, a process should check all executable > + * files with execveat(2) + AT_CHECK. However, such check should only be > + * performed if all to-be-executed code only comes from regular files. For > + * instance, if a script interpreter is called with both a script snipped as > + * argument and a regular file, the interpreter should not check any file. > + * Doing otherwise would mislead the kernel to think that only the script file > + * is being executed, which could for instance lead to unexpected permission > + * change and break current use cases. > + * > + * This secure bit may be set by user session managers, service managers, > + * container runtimes, sandboxer tools... Except for test environments, the > + * related SECBIT_SHOULD_EXEC_CHECK_LOCKED bit should also be set. > + * > + * Ptracing another process is deny if the tracer has SECBIT_SHOULD_EXEC_CHECK > + * but not the tracee. SECBIT_SHOULD_EXEC_CHECK_LOCKED also checked. > + */ > +#define SECURE_SHOULD_EXEC_CHECK 8 > +#define SECURE_SHOULD_EXEC_CHECK_LOCKED 9 /* make bit-8 immutable */ > + > +#define SECBIT_SHOULD_EXEC_CHECK (issecure_mask(SECURE_SHOULD_EXEC_CHECK)) > +#define SECBIT_SHOULD_EXEC_CHECK_LOCKED \ > + (issecure_mask(SECURE_SHOULD_EXEC_CHECK_LOCKED)) > + > +/* > + * When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow > + * execution of approved files, if any (see SECBIT_SHOULD_EXEC_CHECK). For > + * instance, script interpreters called with a script snippet as argument > + * should always deny such execution if SECBIT_SHOULD_EXEC_RESTRICT is set. > + * However, if a script interpreter is called with both > + * SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT, they should > + * interpret the provided script files if no unchecked code is also provided > + * (e.g. directly as argument). > + * > + * This secure bit may be set by user session managers, service managers, > + * container runtimes, sandboxer tools... Except for test environments, the > + * related SECBIT_SHOULD_EXEC_RESTRICT_LOCKED bit should also be set. > + * > + * Ptracing another process is deny if the tracer has > + * SECBIT_SHOULD_EXEC_RESTRICT but not the tracee. > + * SECBIT_SHOULD_EXEC_RESTRICT_LOCKED is also checked. > + */ > +#define SECURE_SHOULD_EXEC_RESTRICT 10 > +#define SECURE_SHOULD_EXEC_RESTRICT_LOCKED 11 /* make bit-8 immutable */ > + > +#define SECBIT_SHOULD_EXEC_RESTRICT (issecure_mask(SECURE_SHOULD_EXEC_RESTRICT)) > +#define SECBIT_SHOULD_EXEC_RESTRICT_LOCKED \ > + (issecure_mask(SECURE_SHOULD_EXEC_RESTRICT_LOCKED)) > + > #define SECURE_ALL_BITS (issecure_mask(SECURE_NOROOT) | \ > issecure_mask(SECURE_NO_SETUID_FIXUP) | \ > issecure_mask(SECURE_KEEP_CAPS) | \ > - issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE)) > + issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE) | \ > + issecure_mask(SECURE_SHOULD_EXEC_CHECK) | \ > + issecure_mask(SECURE_SHOULD_EXEC_RESTRICT)) > #define SECURE_ALL_LOCKS (SECURE_ALL_BITS << 1) > > +#define SECURE_ALL_UNPRIVILEGED (issecure_mask(SECURE_SHOULD_EXEC_CHECK) | \ > + issecure_mask(SECURE_SHOULD_EXEC_RESTRICT)) > + > #endif /* _UAPI_LINUX_SECUREBITS_H */ > diff --git a/security/commoncap.c b/security/commoncap.c > index 162d96b3a676..34b4493e2a69 100644 > --- a/security/commoncap.c > +++ b/security/commoncap.c > @@ -117,6 +117,33 @@ int cap_settime(const struct timespec64 *ts, const struct timezone *tz) > return 0; > } > > +static bool ptrace_secbits_allowed(const struct cred *tracer, > + const struct cred *tracee) > +{ > + const unsigned long tracer_secbits = SECURE_ALL_UNPRIVILEGED & > + tracer->securebits; > + const unsigned long tracee_secbits = SECURE_ALL_UNPRIVILEGED & > + tracee->securebits; > + /* Ignores locking of unset secure bits (cf. SECURE_ALL_LOCKS). */ > + const unsigned long tracer_locked = (tracer_secbits << 1) & > + tracer->securebits; > + const unsigned long tracee_locked = (tracee_secbits << 1) & > + tracee->securebits; > + > + /* The tracee must not have less constraints than the tracer. */ > + if ((tracer_secbits | tracee_secbits) != tracee_secbits) > + return false; > + > + /* > + * Makes sure that the tracer's locks for restrictions are the same for > + * the tracee. > + */ > + if ((tracer_locked | tracee_locked) != tracee_locked) > + return false; > + > + return true; > +} > + > /** > * cap_ptrace_access_check - Determine whether the current process may access > * another > @@ -146,7 +173,8 @@ int cap_ptrace_access_check(struct task_struct *child, unsigned int mode) > else > caller_caps = &cred->cap_permitted; > if (cred->user_ns == child_cred->user_ns && > - cap_issubset(child_cred->cap_permitted, *caller_caps)) > + cap_issubset(child_cred->cap_permitted, *caller_caps) && > + ptrace_secbits_allowed(cred, child_cred)) > goto out; > if (ns_capable(child_cred->user_ns, CAP_SYS_PTRACE)) > goto out; > @@ -178,7 +206,8 @@ int cap_ptrace_traceme(struct task_struct *parent) > cred = __task_cred(parent); > child_cred = current_cred(); > if (cred->user_ns == child_cred->user_ns && > - cap_issubset(child_cred->cap_permitted, cred->cap_permitted)) > + cap_issubset(child_cred->cap_permitted, cred->cap_permitted) && > + ptrace_secbits_allowed(cred, child_cred)) > goto out; > if (has_ns_capability(parent, child_cred->user_ns, CAP_SYS_PTRACE)) > goto out; > @@ -1302,21 +1331,39 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3, > & (old->securebits ^ arg2)) /*[1]*/ > || ((old->securebits & SECURE_ALL_LOCKS & ~arg2)) /*[2]*/ > || (arg2 & ~(SECURE_ALL_LOCKS | SECURE_ALL_BITS)) /*[3]*/ > - || (cap_capable(current_cred(), > - current_cred()->user_ns, > - CAP_SETPCAP, > - CAP_OPT_NONE) != 0) /*[4]*/ > /* > * [1] no changing of bits that are locked > * [2] no unlocking of locks > * [3] no setting of unsupported bits > - * [4] doing anything requires privilege (go read about > - * the "sendmail capabilities bug") > */ > ) > /* cannot change a locked bit */ > return -EPERM; > > + /* > + * Doing anything requires privilege (go read about the > + * "sendmail capabilities bug"), except for unprivileged bits. > + * Indeed, the SECURE_ALL_UNPRIVILEGED bits are not > + * restrictions enforced by the kernel but by user space on > + * itself. The kernel is only in charge of protecting against > + * privilege escalation with ptrace protections. > + */ > + if (cap_capable(current_cred(), current_cred()->user_ns, > + CAP_SETPCAP, CAP_OPT_NONE) != 0) { > + const unsigned long unpriv_and_locks = > + SECURE_ALL_UNPRIVILEGED | > + SECURE_ALL_UNPRIVILEGED << 1; > + const unsigned long changed = old->securebits ^ arg2; > + > + /* For legacy reason, denies non-change. */ > + if (!changed) > + return -EPERM; > + > + /* Denies privileged changes. */ > + if (changed & ~unpriv_and_locks) > + return -EPERM; > + } > + > new = prepare_creds(); > if (!new) > return -ENOMEM; > -- > 2.45.2 >
On Mon, Jul 8, 2024 at 9:17 AM Jeff Xu <jeffxu@google.com> wrote: > > Hi > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > These new SECBIT_SHOULD_EXEC_CHECK, SECBIT_SHOULD_EXEC_RESTRICT, and > > their *_LOCKED counterparts are designed to be set by processes setting > > up an execution environment, such as a user session, a container, or a > > security sandbox. Like seccomp filters or Landlock domains, the > > securebits are inherited across proceses. > > > > When SECBIT_SHOULD_EXEC_CHECK is set, programs interpreting code should > > check executable resources with execveat(2) + AT_CHECK (see previous > > patch). > > > > When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow > > execution of approved resources, if any (see SECBIT_SHOULD_EXEC_CHECK). > > > Do we need both bits ? > When CHECK is set and RESTRICT is not, the "check fail" executable > will still get executed, so CHECK is for logging ? > Does RESTRICT imply CHECK is set, e.g. What if CHECK=0 and RESTRICT = 1 ? > The intention might be "permissive mode"? if so, consider reuse existing selinux's concept, and still with 2 bits: SECBIT_SHOULD_EXEC_RESTRICT SECBIT_SHOULD_EXEC_RESTRICT_PERMISSIVE -Jeff > > For a secure environment, we might also want > > SECBIT_SHOULD_EXEC_CHECK_LOCKED and SECBIT_SHOULD_EXEC_RESTRICT_LOCKED > > to be set. For a test environment (e.g. testing on a fleet to identify > > potential issues), only the SECBIT_SHOULD_EXEC_CHECK* bits can be set to > > still be able to identify potential issues (e.g. with interpreters logs > > or LSMs audit entries). > > > > It should be noted that unlike other security bits, the > > SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT bits are > > dedicated to user space willing to restrict itself. Because of that, > > they only make sense in the context of a trusted environment (e.g. > > sandbox, container, user session, full system) where the process > > changing its behavior (according to these bits) and all its parent > > processes are trusted. Otherwise, any parent process could just execute > > its own malicious code (interpreting a script or not), or even enforce a > > seccomp filter to mask these bits. > > > > Such a secure environment can be achieved with an appropriate access > > control policy (e.g. mount's noexec option, file access rights, LSM > > configuration) and an enlighten ld.so checking that libraries are > > allowed for execution e.g., to protect against illegitimate use of > > LD_PRELOAD. > > > > Scripts may need some changes to deal with untrusted data (e.g. stdin, > > environment variables), but that is outside the scope of the kernel. > > > > The only restriction enforced by the kernel is the right to ptrace > > another process. Processes are denied to ptrace less restricted ones, > > unless the tracer has CAP_SYS_PTRACE. This is mainly a safeguard to > > avoid trivial privilege escalations e.g., by a debugging process being > > abused with a confused deputy attack. > > > > Cc: Al Viro <viro@zeniv.linux.org.uk> > > Cc: Christian Brauner <brauner@kernel.org> > > Cc: Kees Cook <keescook@chromium.org> > > Cc: Paul Moore <paul@paul-moore.com> > > Signed-off-by: Mickaël Salaün <mic@digikod.net> > > Link: https://lore.kernel.org/r/20240704190137.696169-3-mic@digikod.net > > --- > > > > New design since v18: > > https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net > > --- > > include/uapi/linux/securebits.h | 56 ++++++++++++++++++++++++++++- > > security/commoncap.c | 63 ++++++++++++++++++++++++++++----- > > 2 files changed, 110 insertions(+), 9 deletions(-) > > > > diff --git a/include/uapi/linux/securebits.h b/include/uapi/linux/securebits.h > > index d6d98877ff1a..3fdb0382718b 100644 > > --- a/include/uapi/linux/securebits.h > > +++ b/include/uapi/linux/securebits.h > > @@ -52,10 +52,64 @@ > > #define SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED \ > > (issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE_LOCKED)) > > > > +/* > > + * When SECBIT_SHOULD_EXEC_CHECK is set, a process should check all executable > > + * files with execveat(2) + AT_CHECK. However, such check should only be > > + * performed if all to-be-executed code only comes from regular files. For > > + * instance, if a script interpreter is called with both a script snipped as > > + * argument and a regular file, the interpreter should not check any file. > > + * Doing otherwise would mislead the kernel to think that only the script file > > + * is being executed, which could for instance lead to unexpected permission > > + * change and break current use cases. > > + * > > + * This secure bit may be set by user session managers, service managers, > > + * container runtimes, sandboxer tools... Except for test environments, the > > + * related SECBIT_SHOULD_EXEC_CHECK_LOCKED bit should also be set. > > + * > > + * Ptracing another process is deny if the tracer has SECBIT_SHOULD_EXEC_CHECK > > + * but not the tracee. SECBIT_SHOULD_EXEC_CHECK_LOCKED also checked. > > + */ > > +#define SECURE_SHOULD_EXEC_CHECK 8 > > +#define SECURE_SHOULD_EXEC_CHECK_LOCKED 9 /* make bit-8 immutable */ > > + > > +#define SECBIT_SHOULD_EXEC_CHECK (issecure_mask(SECURE_SHOULD_EXEC_CHECK)) > > +#define SECBIT_SHOULD_EXEC_CHECK_LOCKED \ > > + (issecure_mask(SECURE_SHOULD_EXEC_CHECK_LOCKED)) > > + > > +/* > > + * When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow > > + * execution of approved files, if any (see SECBIT_SHOULD_EXEC_CHECK). For > > + * instance, script interpreters called with a script snippet as argument > > + * should always deny such execution if SECBIT_SHOULD_EXEC_RESTRICT is set. > > + * However, if a script interpreter is called with both > > + * SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT, they should > > + * interpret the provided script files if no unchecked code is also provided > > + * (e.g. directly as argument). > > + * > > + * This secure bit may be set by user session managers, service managers, > > + * container runtimes, sandboxer tools... Except for test environments, the > > + * related SECBIT_SHOULD_EXEC_RESTRICT_LOCKED bit should also be set. > > + * > > + * Ptracing another process is deny if the tracer has > > + * SECBIT_SHOULD_EXEC_RESTRICT but not the tracee. > > + * SECBIT_SHOULD_EXEC_RESTRICT_LOCKED is also checked. > > + */ > > +#define SECURE_SHOULD_EXEC_RESTRICT 10 > > +#define SECURE_SHOULD_EXEC_RESTRICT_LOCKED 11 /* make bit-8 immutable */ > > + > > +#define SECBIT_SHOULD_EXEC_RESTRICT (issecure_mask(SECURE_SHOULD_EXEC_RESTRICT)) > > +#define SECBIT_SHOULD_EXEC_RESTRICT_LOCKED \ > > + (issecure_mask(SECURE_SHOULD_EXEC_RESTRICT_LOCKED)) > > + > > #define SECURE_ALL_BITS (issecure_mask(SECURE_NOROOT) | \ > > issecure_mask(SECURE_NO_SETUID_FIXUP) | \ > > issecure_mask(SECURE_KEEP_CAPS) | \ > > - issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE)) > > + issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE) | \ > > + issecure_mask(SECURE_SHOULD_EXEC_CHECK) | \ > > + issecure_mask(SECURE_SHOULD_EXEC_RESTRICT)) > > #define SECURE_ALL_LOCKS (SECURE_ALL_BITS << 1) > > > > +#define SECURE_ALL_UNPRIVILEGED (issecure_mask(SECURE_SHOULD_EXEC_CHECK) | \ > > + issecure_mask(SECURE_SHOULD_EXEC_RESTRICT)) > > + > > #endif /* _UAPI_LINUX_SECUREBITS_H */ > > diff --git a/security/commoncap.c b/security/commoncap.c > > index 162d96b3a676..34b4493e2a69 100644 > > --- a/security/commoncap.c > > +++ b/security/commoncap.c > > @@ -117,6 +117,33 @@ int cap_settime(const struct timespec64 *ts, const struct timezone *tz) > > return 0; > > } > > > > +static bool ptrace_secbits_allowed(const struct cred *tracer, > > + const struct cred *tracee) > > +{ > > + const unsigned long tracer_secbits = SECURE_ALL_UNPRIVILEGED & > > + tracer->securebits; > > + const unsigned long tracee_secbits = SECURE_ALL_UNPRIVILEGED & > > + tracee->securebits; > > + /* Ignores locking of unset secure bits (cf. SECURE_ALL_LOCKS). */ > > + const unsigned long tracer_locked = (tracer_secbits << 1) & > > + tracer->securebits; > > + const unsigned long tracee_locked = (tracee_secbits << 1) & > > + tracee->securebits; > > + > > + /* The tracee must not have less constraints than the tracer. */ > > + if ((tracer_secbits | tracee_secbits) != tracee_secbits) > > + return false; > > + > > + /* > > + * Makes sure that the tracer's locks for restrictions are the same for > > + * the tracee. > > + */ > > + if ((tracer_locked | tracee_locked) != tracee_locked) > > + return false; > > + > > + return true; > > +} > > + > > /** > > * cap_ptrace_access_check - Determine whether the current process may access > > * another > > @@ -146,7 +173,8 @@ int cap_ptrace_access_check(struct task_struct *child, unsigned int mode) > > else > > caller_caps = &cred->cap_permitted; > > if (cred->user_ns == child_cred->user_ns && > > - cap_issubset(child_cred->cap_permitted, *caller_caps)) > > + cap_issubset(child_cred->cap_permitted, *caller_caps) && > > + ptrace_secbits_allowed(cred, child_cred)) > > goto out; > > if (ns_capable(child_cred->user_ns, CAP_SYS_PTRACE)) > > goto out; > > @@ -178,7 +206,8 @@ int cap_ptrace_traceme(struct task_struct *parent) > > cred = __task_cred(parent); > > child_cred = current_cred(); > > if (cred->user_ns == child_cred->user_ns && > > - cap_issubset(child_cred->cap_permitted, cred->cap_permitted)) > > + cap_issubset(child_cred->cap_permitted, cred->cap_permitted) && > > + ptrace_secbits_allowed(cred, child_cred)) > > goto out; > > if (has_ns_capability(parent, child_cred->user_ns, CAP_SYS_PTRACE)) > > goto out; > > @@ -1302,21 +1331,39 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3, > > & (old->securebits ^ arg2)) /*[1]*/ > > || ((old->securebits & SECURE_ALL_LOCKS & ~arg2)) /*[2]*/ > > || (arg2 & ~(SECURE_ALL_LOCKS | SECURE_ALL_BITS)) /*[3]*/ > > - || (cap_capable(current_cred(), > > - current_cred()->user_ns, > > - CAP_SETPCAP, > > - CAP_OPT_NONE) != 0) /*[4]*/ > > /* > > * [1] no changing of bits that are locked > > * [2] no unlocking of locks > > * [3] no setting of unsupported bits > > - * [4] doing anything requires privilege (go read about > > - * the "sendmail capabilities bug") > > */ > > ) > > /* cannot change a locked bit */ > > return -EPERM; > > > > + /* > > + * Doing anything requires privilege (go read about the > > + * "sendmail capabilities bug"), except for unprivileged bits. > > + * Indeed, the SECURE_ALL_UNPRIVILEGED bits are not > > + * restrictions enforced by the kernel but by user space on > > + * itself. The kernel is only in charge of protecting against > > + * privilege escalation with ptrace protections. > > + */ > > + if (cap_capable(current_cred(), current_cred()->user_ns, > > + CAP_SETPCAP, CAP_OPT_NONE) != 0) { > > + const unsigned long unpriv_and_locks = > > + SECURE_ALL_UNPRIVILEGED | > > + SECURE_ALL_UNPRIVILEGED << 1; > > + const unsigned long changed = old->securebits ^ arg2; > > + > > + /* For legacy reason, denies non-change. */ > > + if (!changed) > > + return -EPERM; > > + > > + /* Denies privileged changes. */ > > + if (changed & ~unpriv_and_locks) > > + return -EPERM; > > + } > > + > > new = prepare_creds(); > > if (!new) > > return -ENOMEM; > > -- > > 2.45.2 > >
On Mon, Jul 08, 2024 at 10:53:11AM -0700, Jeff Xu wrote: > On Mon, Jul 8, 2024 at 9:17 AM Jeff Xu <jeffxu@google.com> wrote: > > > > Hi > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > These new SECBIT_SHOULD_EXEC_CHECK, SECBIT_SHOULD_EXEC_RESTRICT, and > > > their *_LOCKED counterparts are designed to be set by processes setting > > > up an execution environment, such as a user session, a container, or a > > > security sandbox. Like seccomp filters or Landlock domains, the > > > securebits are inherited across proceses. > > > > > > When SECBIT_SHOULD_EXEC_CHECK is set, programs interpreting code should > > > check executable resources with execveat(2) + AT_CHECK (see previous > > > patch). > > > > > > When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow > > > execution of approved resources, if any (see SECBIT_SHOULD_EXEC_CHECK). > > > > > Do we need both bits ? > > When CHECK is set and RESTRICT is not, the "check fail" executable > > will still get executed, so CHECK is for logging ? > > Does RESTRICT imply CHECK is set, e.g. What if CHECK=0 and RESTRICT = 1 ? > > > The intention might be "permissive mode"? if so, consider reuse > existing selinux's concept, and still with 2 bits: > SECBIT_SHOULD_EXEC_RESTRICT > SECBIT_SHOULD_EXEC_RESTRICT_PERMISSIVE SECBIT_SHOULD_EXEC_CHECK is for user space to check with execveat+AT_CHECK. SECBIT_SHOULD_EXEC_RESTRICT is for user space to restrict execution by default, and potentially allow some exceptions from the list of checked-and-allowed files, if SECBIT_SHOULD_EXEC_CHECK is set. Without SECBIT_SHOULD_EXEC_CHECK, SECBIT_SHOULD_EXEC_RESTRICT is to deny any kind of execution/interpretation. With only SECBIT_SHOULD_EXEC_CHECK, user space should just check and log any denied access, but ignore them. So yes, it is similar to the SELinux's permissive mode. This is explained in the next patch as comments. The *_LOCKED variants are useful and part of the securebits concept. > > > -Jeff > > > > > > > For a secure environment, we might also want > > > SECBIT_SHOULD_EXEC_CHECK_LOCKED and SECBIT_SHOULD_EXEC_RESTRICT_LOCKED > > > to be set. For a test environment (e.g. testing on a fleet to identify > > > potential issues), only the SECBIT_SHOULD_EXEC_CHECK* bits can be set to > > > still be able to identify potential issues (e.g. with interpreters logs > > > or LSMs audit entries). > > > > > > It should be noted that unlike other security bits, the > > > SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT bits are > > > dedicated to user space willing to restrict itself. Because of that, > > > they only make sense in the context of a trusted environment (e.g. > > > sandbox, container, user session, full system) where the process > > > changing its behavior (according to these bits) and all its parent > > > processes are trusted. Otherwise, any parent process could just execute > > > its own malicious code (interpreting a script or not), or even enforce a > > > seccomp filter to mask these bits. > > > > > > Such a secure environment can be achieved with an appropriate access > > > control policy (e.g. mount's noexec option, file access rights, LSM > > > configuration) and an enlighten ld.so checking that libraries are > > > allowed for execution e.g., to protect against illegitimate use of > > > LD_PRELOAD. > > > > > > Scripts may need some changes to deal with untrusted data (e.g. stdin, > > > environment variables), but that is outside the scope of the kernel. > > > > > > The only restriction enforced by the kernel is the right to ptrace > > > another process. Processes are denied to ptrace less restricted ones, > > > unless the tracer has CAP_SYS_PTRACE. This is mainly a safeguard to > > > avoid trivial privilege escalations e.g., by a debugging process being > > > abused with a confused deputy attack. > > > > > > Cc: Al Viro <viro@zeniv.linux.org.uk> > > > Cc: Christian Brauner <brauner@kernel.org> > > > Cc: Kees Cook <keescook@chromium.org> > > > Cc: Paul Moore <paul@paul-moore.com> > > > Signed-off-by: Mickaël Salaün <mic@digikod.net> > > > Link: https://lore.kernel.org/r/20240704190137.696169-3-mic@digikod.net > > > ---
On Mon, Jul 8, 2024 at 11:48 AM Mickaël Salaün <mic@digikod.net> wrote: > > On Mon, Jul 08, 2024 at 10:53:11AM -0700, Jeff Xu wrote: > > On Mon, Jul 8, 2024 at 9:17 AM Jeff Xu <jeffxu@google.com> wrote: > > > > > > Hi > > > > > > On Thu, Jul 4, 2024 at 12:02 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > > > These new SECBIT_SHOULD_EXEC_CHECK, SECBIT_SHOULD_EXEC_RESTRICT, and > > > > their *_LOCKED counterparts are designed to be set by processes setting > > > > up an execution environment, such as a user session, a container, or a > > > > security sandbox. Like seccomp filters or Landlock domains, the > > > > securebits are inherited across proceses. > > > > > > > > When SECBIT_SHOULD_EXEC_CHECK is set, programs interpreting code should > > > > check executable resources with execveat(2) + AT_CHECK (see previous > > > > patch). > > > > > > > > When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow > > > > execution of approved resources, if any (see SECBIT_SHOULD_EXEC_CHECK). > > > > > > > Do we need both bits ? > > > When CHECK is set and RESTRICT is not, the "check fail" executable > > > will still get executed, so CHECK is for logging ? > > > Does RESTRICT imply CHECK is set, e.g. What if CHECK=0 and RESTRICT = 1 ? > > > > > The intention might be "permissive mode"? if so, consider reuse > > existing selinux's concept, and still with 2 bits: > > SECBIT_SHOULD_EXEC_RESTRICT > > SECBIT_SHOULD_EXEC_RESTRICT_PERMISSIVE > > SECBIT_SHOULD_EXEC_CHECK is for user space to check with execveat+AT_CHECK. > > SECBIT_SHOULD_EXEC_RESTRICT is for user space to restrict execution by > default, and potentially allow some exceptions from the list of > checked-and-allowed files, if SECBIT_SHOULD_EXEC_CHECK is set. > > Without SECBIT_SHOULD_EXEC_CHECK, SECBIT_SHOULD_EXEC_RESTRICT is to deny > any kind of execution/interpretation. > Do you mean "deny any kinds of executable/interpretation" or just those that failed with "AT_CHECK" ( I assume this)? > With only SECBIT_SHOULD_EXEC_CHECK, user space should just check and log > any denied access, but ignore them. So yes, it is similar to the > SELinux's permissive mode. > IIUC: CHECK=0, RESTRICT=0: do nothing, current behavior CHECK=1, RESTRICT=0: permissive mode - ignore AT_CHECK results. CHECK=0, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, no exception. CHECK=1, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, except those in the "checked-and-allowed" list. So CHECK is basically trying to form a allowlist? If there is a need for a allowlist, that is the task of "interruptor or dynamic linker" to maintain this list, and the list is known in advance, i.e. not something from execveat(AT_CHECK), and kernel shouldn't have the knowledge of this allowlist. Secondly, the concept of allow-list seems to be an attack factor for me, I would rather it be fully enforced, or permissive mode. And Check=1 and RESTRICT=1 is less secure than CHECK=0, RESTRICT=1, this might also be not obvious to dev. Unless I understood the CHECK wrong. > This is explained in the next patch as comments. > The next patch is a selftest patch, it is better to define them in the current commit and in the securebits.h. > The *_LOCKED variants are useful and part of the securebits concept. > The locked state is easy to understand. Thanks Best regards -Jeff > > > > > > -Jeff > > > > > > > > > > > > For a secure environment, we might also want > > > > SECBIT_SHOULD_EXEC_CHECK_LOCKED and SECBIT_SHOULD_EXEC_RESTRICT_LOCKED > > > > to be set. For a test environment (e.g. testing on a fleet to identify > > > > potential issues), only the SECBIT_SHOULD_EXEC_CHECK* bits can be set to > > > > still be able to identify potential issues (e.g. with interpreters logs > > > > or LSMs audit entries). > > > > > > > > It should be noted that unlike other security bits, the > > > > SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT bits are > > > > dedicated to user space willing to restrict itself. Because of that, > > > > they only make sense in the context of a trusted environment (e.g. > > > > sandbox, container, user session, full system) where the process > > > > changing its behavior (according to these bits) and all its parent > > > > processes are trusted. Otherwise, any parent process could just execute > > > > its own malicious code (interpreting a script or not), or even enforce a > > > > seccomp filter to mask these bits. > > > > > > > > Such a secure environment can be achieved with an appropriate access > > > > control policy (e.g. mount's noexec option, file access rights, LSM > > > > configuration) and an enlighten ld.so checking that libraries are > > > > allowed for execution e.g., to protect against illegitimate use of > > > > LD_PRELOAD. > > > > > > > > Scripts may need some changes to deal with untrusted data (e.g. stdin, > > > > environment variables), but that is outside the scope of the kernel. > > > > > > > > The only restriction enforced by the kernel is the right to ptrace > > > > another process. Processes are denied to ptrace less restricted ones, > > > > unless the tracer has CAP_SYS_PTRACE. This is mainly a safeguard to > > > > avoid trivial privilege escalations e.g., by a debugging process being > > > > abused with a confused deputy attack. > > > > > > > > Cc: Al Viro <viro@zeniv.linux.org.uk> > > > > Cc: Christian Brauner <brauner@kernel.org> > > > > Cc: Kees Cook <keescook@chromium.org> > > > > Cc: Paul Moore <paul@paul-moore.com> > > > > Signed-off-by: Mickaël Salaün <mic@digikod.net> > > > > Link: https://lore.kernel.org/r/20240704190137.696169-3-mic@digikod.net > > > > ---
On 08/07/2024 22:15, Jeff Xu wrote: > IIUC: > CHECK=0, RESTRICT=0: do nothing, current behavior > CHECK=1, RESTRICT=0: permissive mode - ignore AT_CHECK results. > CHECK=0, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, no exception. > CHECK=1, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, except > those in the "checked-and-allowed" list. I had much the same question for Mickaël while working on this. Essentially, "CHECK=0, RESTRICT=1" means to restrict without checking. In the context of a script or macro interpreter, this just means it will never interpret any scripts. Non-binary code execution is fully disabled in any part of the process that respects these bits. "CHECK=1, RESTRICT=1" means to restrict unless AT_CHECK passes. This case is the allow list (or whatever mechanism is being used to determine the result of an AT_CHECK check). The actual mechanism isn't the business of the script interpreter at all, it just has to refuse to execute anything that doesn't pass the check. So a generic interpreter can implement a generic mechanism and leave the specifics to whoever configures the machine. The other two case are more obvious. "CHECK=0, RESTRICT=0" is the zero-overhead case, while "CHECK=1, RESTRICT=0" might log, warn, or otherwise audit the result of the check, but it won't restrict execution. Cheers, Steve
On Mon, Jul 8, 2024 at 2:25 PM Steve Dower <steve.dower@python.org> wrote: > > On 08/07/2024 22:15, Jeff Xu wrote: > > IIUC: > > CHECK=0, RESTRICT=0: do nothing, current behavior > > CHECK=1, RESTRICT=0: permissive mode - ignore AT_CHECK results. > > CHECK=0, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, no exception. > > CHECK=1, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, except > > those in the "checked-and-allowed" list. > > I had much the same question for Mickaël while working on this. > > Essentially, "CHECK=0, RESTRICT=1" means to restrict without checking. > In the context of a script or macro interpreter, this just means it will > never interpret any scripts. Non-binary code execution is fully disabled > in any part of the process that respects these bits. > I see, so Mickaël does mean this will block all scripts. I guess, in the context of dynamic linker, this means: no more .so loading, even "dlopen" is called by an app ? But this will make the execve() fail. > "CHECK=1, RESTRICT=1" means to restrict unless AT_CHECK passes. This > case is the allow list (or whatever mechanism is being used to determine > the result of an AT_CHECK check). The actual mechanism isn't the > business of the script interpreter at all, it just has to refuse to > execute anything that doesn't pass the check. So a generic interpreter > can implement a generic mechanism and leave the specifics to whoever > configures the machine. > In the context of dynamic linker. this means: if .so passed the AT_CHECK, ldopen() can still load it. If .so fails the AT_CHECK, ldopen() will fail too. Thanks -Jeff > The other two case are more obvious. "CHECK=0, RESTRICT=0" is the > zero-overhead case, while "CHECK=1, RESTRICT=0" might log, warn, or > otherwise audit the result of the check, but it won't restrict execution. > > Cheers, > Steve
On Mon, Jul 08, 2024 at 03:07:24PM -0700, Jeff Xu wrote: > On Mon, Jul 8, 2024 at 2:25 PM Steve Dower <steve.dower@python.org> wrote: > > > > On 08/07/2024 22:15, Jeff Xu wrote: > > > IIUC: > > > CHECK=0, RESTRICT=0: do nothing, current behavior > > > CHECK=1, RESTRICT=0: permissive mode - ignore AT_CHECK results. > > > CHECK=0, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, no exception. > > > CHECK=1, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, except > > > those in the "checked-and-allowed" list. > > > > I had much the same question for Mickaël while working on this. > > > > Essentially, "CHECK=0, RESTRICT=1" means to restrict without checking. > > In the context of a script or macro interpreter, this just means it will > > never interpret any scripts. Non-binary code execution is fully disabled > > in any part of the process that respects these bits. > > > I see, so Mickaël does mean this will block all scripts. That is the initial idea. > I guess, in the context of dynamic linker, this means: no more .so > loading, even "dlopen" is called by an app ? But this will make the > execve() fail. Hmm, I'm not sure this "CHECK=0, RESTRICT=1" configuration would make sense for a dynamic linker except maybe if we want to only allow static binaries? The CHECK and RESTRICT securebits are designed to make it possible a "permissive mode" and an enforcement mode with the related locked securebits. This is why this "CHECK=0, RESTRICT=1" combination looks a bit weird. We can replace these securebits with others but I didn't find a better (and simple) option. I don't think this is an issue because with any security policy we can create unusable combinations. The three other combinations makes a lot of sense though. > > > "CHECK=1, RESTRICT=1" means to restrict unless AT_CHECK passes. This > > case is the allow list (or whatever mechanism is being used to determine > > the result of an AT_CHECK check). The actual mechanism isn't the > > business of the script interpreter at all, it just has to refuse to > > execute anything that doesn't pass the check. So a generic interpreter > > can implement a generic mechanism and leave the specifics to whoever > > configures the machine. > > > In the context of dynamic linker. this means: > if .so passed the AT_CHECK, ldopen() can still load it. > If .so fails the AT_CHECK, ldopen() will fail too. Correct > > Thanks > -Jeff > > > The other two case are more obvious. "CHECK=0, RESTRICT=0" is the > > zero-overhead case, while "CHECK=1, RESTRICT=0" might log, warn, or > > otherwise audit the result of the check, but it won't restrict execution. > > > > Cheers, > > Steve
On Tue, Jul 9, 2024 at 1:42 PM Mickaël Salaün <mic@digikod.net> wrote: > > On Mon, Jul 08, 2024 at 03:07:24PM -0700, Jeff Xu wrote: > > On Mon, Jul 8, 2024 at 2:25 PM Steve Dower <steve.dower@python.org> wrote: > > > > > > On 08/07/2024 22:15, Jeff Xu wrote: > > > > IIUC: > > > > CHECK=0, RESTRICT=0: do nothing, current behavior > > > > CHECK=1, RESTRICT=0: permissive mode - ignore AT_CHECK results. > > > > CHECK=0, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, no exception. > > > > CHECK=1, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, except > > > > those in the "checked-and-allowed" list. > > > > > > I had much the same question for Mickaël while working on this. > > > > > > Essentially, "CHECK=0, RESTRICT=1" means to restrict without checking. > > > In the context of a script or macro interpreter, this just means it will > > > never interpret any scripts. Non-binary code execution is fully disabled > > > in any part of the process that respects these bits. > > > > > I see, so Mickaël does mean this will block all scripts. > > That is the initial idea. > > > I guess, in the context of dynamic linker, this means: no more .so > > loading, even "dlopen" is called by an app ? But this will make the > > execve() fail. > > Hmm, I'm not sure this "CHECK=0, RESTRICT=1" configuration would make > sense for a dynamic linker except maybe if we want to only allow static > binaries? > > The CHECK and RESTRICT securebits are designed to make it possible a > "permissive mode" and an enforcement mode with the related locked > securebits. This is why this "CHECK=0, RESTRICT=1" combination looks a > bit weird. We can replace these securebits with others but I didn't > find a better (and simple) option. I don't think this is an issue > because with any security policy we can create unusable combinations. > The three other combinations makes a lot of sense though. > If we need only handle 3 combinations, I would think something like below is easier to understand, and don't have wield state like CHECK=0, RESTRICT=1 XX_RESTRICT: when true: Perform the AT_CHECK, and deny the executable after AT_CHECK fails. XX_RESTRICT_PERMISSIVE: take effect when XX_RESTRICT is true. True means Ignoring the AT_CHECK result. Or XX_CHECK: when true: Perform the AT_CHECK. XX_CHECK_ENFORCE takes effect only when XX_CHECK is true. True means restrict the executable when AT_CHECK failed; false means ignore the AT_CHECK failure. Of course, we can replace XX_CHECK_ENFORCE with XX_RESTRICT. Personally I think having _CHECK_ in the name implies the XX_CHECK needs to be true as a prerequisite for this flag , but that is my opinion only. As long as the semantics are clear as part of the comments of definition in code, it is fine. Thanks -Jeff > > > > > "CHECK=1, RESTRICT=1" means to restrict unless AT_CHECK passes. This > > > case is the allow list (or whatever mechanism is being used to determine > > > the result of an AT_CHECK check). The actual mechanism isn't the > > > business of the script interpreter at all, it just has to refuse to > > > execute anything that doesn't pass the check. So a generic interpreter > > > can implement a generic mechanism and leave the specifics to whoever > > > configures the machine. > > > > > In the context of dynamic linker. this means: > > if .so passed the AT_CHECK, ldopen() can still load it. > > If .so fails the AT_CHECK, ldopen() will fail too. > > Correct > > > > > Thanks > > -Jeff > > > > > The other two case are more obvious. "CHECK=0, RESTRICT=0" is the > > > zero-overhead case, while "CHECK=1, RESTRICT=0" might log, warn, or > > > otherwise audit the result of the check, but it won't restrict execution. > > > > > > Cheers, > > > Steve
On Tue, Jul 09, 2024 at 02:57:43PM -0700, Jeff Xu wrote: > On Tue, Jul 9, 2024 at 1:42 PM Mickaël Salaün <mic@digikod.net> wrote: > > > > On Mon, Jul 08, 2024 at 03:07:24PM -0700, Jeff Xu wrote: > > > On Mon, Jul 8, 2024 at 2:25 PM Steve Dower <steve.dower@python.org> wrote: > > > > > > > > On 08/07/2024 22:15, Jeff Xu wrote: > > > > > IIUC: > > > > > CHECK=0, RESTRICT=0: do nothing, current behavior > > > > > CHECK=1, RESTRICT=0: permissive mode - ignore AT_CHECK results. > > > > > CHECK=0, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, no exception. > > > > > CHECK=1, RESTRICT=1: call AT_CHECK, deny if AT_CHECK failed, except > > > > > those in the "checked-and-allowed" list. > > > > > > > > I had much the same question for Mickaël while working on this. > > > > > > > > Essentially, "CHECK=0, RESTRICT=1" means to restrict without checking. > > > > In the context of a script or macro interpreter, this just means it will > > > > never interpret any scripts. Non-binary code execution is fully disabled > > > > in any part of the process that respects these bits. > > > > > > > I see, so Mickaël does mean this will block all scripts. > > > > That is the initial idea. > > > > > I guess, in the context of dynamic linker, this means: no more .so > > > loading, even "dlopen" is called by an app ? But this will make the > > > execve() fail. > > > > Hmm, I'm not sure this "CHECK=0, RESTRICT=1" configuration would make > > sense for a dynamic linker except maybe if we want to only allow static > > binaries? > > > > The CHECK and RESTRICT securebits are designed to make it possible a > > "permissive mode" and an enforcement mode with the related locked > > securebits. This is why this "CHECK=0, RESTRICT=1" combination looks a > > bit weird. We can replace these securebits with others but I didn't > > find a better (and simple) option. I don't think this is an issue > > because with any security policy we can create unusable combinations. > > The three other combinations makes a lot of sense though. > > > If we need only handle 3 combinations, I would think something like > below is easier to understand, and don't have wield state like > CHECK=0, RESTRICT=1 The "CHECK=0, RESTRICT=1" is useful for script interpreter instances that should not interpret any command from users e.g., but only execute script files. > > XX_RESTRICT: when true: Perform the AT_CHECK, and deny the executable > after AT_CHECK fails. > XX_RESTRICT_PERMISSIVE: take effect when XX_RESTRICT is true. True > means Ignoring the AT_CHECK result. We get a similar weird state with XX_RESTRICT_PERMISSIVE=1 and XX_RESTRICT=0 As a side note, for compatibility reasons, by default all securebits must be 0, and this must translate to no restriction. > > Or > > XX_CHECK: when true: Perform the AT_CHECK. > XX_CHECK_ENFORCE takes effect only when XX_CHECK is true. True means > restrict the executable when AT_CHECK failed; false means ignore the > AT_CHECK failure. We get a similar weird state with XX_CHECK_ENFORCE=1 and XX_CHECK=0 > > Of course, we can replace XX_CHECK_ENFORCE with XX_RESTRICT. > Personally I think having _CHECK_ in the name implies the XX_CHECK > needs to be true as a prerequisite for this flag , but that is my > opinion only. As long as the semantics are clear as part of the > comments of definition in code, it is fine. Here is another proposal: We can change a bit the semantic by making it the norm to always check file executability with AT_CHECK, and using the securebits to restrict file interpretation and/or command injection (e.g. user supplied shell commands). Non-executable checked files can be reported/logged at the kernel level, with audit, configured by sysadmins. New securebits (feel free to propose better names): - SECBIT_EXEC_RESTRICT_FILE: requires AT_CHECK to pass. - SECBIT_EXEC_DENY_INTERACTIVE: deny any command injection via command line arguments, environment variables, or configuration files. This should be ignored by dynamic linkers. We could also have an allow-list of shells for which this bit is not set, managed by an LSM's policy, if the native securebits scoping approach is not enough. Different modes for script interpreters: 1. RESTRICT_FILE=0 DENY_INTERACTIVE=0 (default) Always interpret scripts, and allow arbitrary user commands. => No threat, everyone and everything is trusted, but we can get ahead of potential issues with logs to prepare for a migration to a restrictive mode. 2. RESTRICT_FILE=1 DENY_INTERACTIVE=0 Deny script interpretation if they are not executable, and allow arbitrary user commands. => Threat: (potential) malicious scripts run by trusted (and not fooled) users. That could protect against unintended script executions (e.g. sh /tmp/*.sh). ==> Makes sense for (semi-restricted) user sessions. 3. RESTRICT_FILE=1 DENY_INTERACTIVE=1 Deny script interpretation if they are not executable, and also deny any arbitrary user commands. => Threat: malicious scripts run by untrusted users. ==> Makes sense for system services executing scripts. 4. RESTRICT_FILE=0 DENY_INTERACTIVE=1 Always interpret scripts, but deny arbitrary user commands. => Goal: monitor/measure/assess script content (e.g. with IMA/EVM) in a system where the access rights are not (yet) ready. Arbitrary user commands would be much more difficult to monitor. ==> First step of restricting system services that should not directly pass arbitrary commands to shells. > > Thanks > -Jeff > > > > > > > > > "CHECK=1, RESTRICT=1" means to restrict unless AT_CHECK passes. This > > > > case is the allow list (or whatever mechanism is being used to determine > > > > the result of an AT_CHECK check). The actual mechanism isn't the > > > > business of the script interpreter at all, it just has to refuse to > > > > execute anything that doesn't pass the check. So a generic interpreter > > > > can implement a generic mechanism and leave the specifics to whoever > > > > configures the machine. > > > > > > > In the context of dynamic linker. this means: > > > if .so passed the AT_CHECK, ldopen() can still load it. > > > If .so fails the AT_CHECK, ldopen() will fail too. > > > > Correct > > > > > > > > Thanks > > > -Jeff > > > > > > > The other two case are more obvious. "CHECK=0, RESTRICT=0" is the > > > > zero-overhead case, while "CHECK=1, RESTRICT=0" might log, warn, or > > > > otherwise audit the result of the check, but it won't restrict execution. > > > > > > > > Cheers, > > > > Steve
On Wed, Jul 10, 2024 at 11:58:25AM +0200, Mickaël Salaün wrote: > Here is another proposal: > > We can change a bit the semantic by making it the norm to always check > file executability with AT_CHECK, and using the securebits to restrict > file interpretation and/or command injection (e.g. user supplied shell > commands). Non-executable checked files can be reported/logged at the > kernel level, with audit, configured by sysadmins. > > New securebits (feel free to propose better names): > > - SECBIT_EXEC_RESTRICT_FILE: requires AT_CHECK to pass. Would you want the enforcement of this bit done by userspace or the kernel? IIUC, userspace would always perform AT_CHECK regardless of SECBIT_EXEC_RESTRICT_FILE, and then which would happen? 1) userspace would ignore errors from AT_CHECK when SECBIT_EXEC_RESTRICT_FILE is unset or 2) kernel would allow all AT_CHECK when SECBIT_EXEC_RESTRICT_FILE is unset I suspect 1 is best and what you intend, given that SECBIT_EXEC_DENY_INTERACTIVE can only be enforced by userspace. > - SECBIT_EXEC_DENY_INTERACTIVE: deny any command injection via > command line arguments, environment variables, or configuration files. > This should be ignored by dynamic linkers. We could also have an > allow-list of shells for which this bit is not set, managed by an > LSM's policy, if the native securebits scoping approach is not enough. > > Different modes for script interpreters: > > 1. RESTRICT_FILE=0 DENY_INTERACTIVE=0 (default) > Always interpret scripts, and allow arbitrary user commands. > => No threat, everyone and everything is trusted, but we can get > ahead of potential issues with logs to prepare for a migration to a > restrictive mode. > > 2. RESTRICT_FILE=1 DENY_INTERACTIVE=0 > Deny script interpretation if they are not executable, and allow > arbitrary user commands. > => Threat: (potential) malicious scripts run by trusted (and not > fooled) users. That could protect against unintended script > executions (e.g. sh /tmp/*.sh). > ==> Makes sense for (semi-restricted) user sessions. > > 3. RESTRICT_FILE=1 DENY_INTERACTIVE=1 > Deny script interpretation if they are not executable, and also deny > any arbitrary user commands. > => Threat: malicious scripts run by untrusted users. > ==> Makes sense for system services executing scripts. > > 4. RESTRICT_FILE=0 DENY_INTERACTIVE=1 > Always interpret scripts, but deny arbitrary user commands. > => Goal: monitor/measure/assess script content (e.g. with IMA/EVM) in > a system where the access rights are not (yet) ready. Arbitrary > user commands would be much more difficult to monitor. > ==> First step of restricting system services that should not > directly pass arbitrary commands to shells. I like these bits!
On 10/07/2024 10:58, Mickaël Salaün wrote: > On Tue, Jul 09, 2024 at 02:57:43PM -0700, Jeff Xu wrote: >>> Hmm, I'm not sure this "CHECK=0, RESTRICT=1" configuration would make >>> sense for a dynamic linker except maybe if we want to only allow static >>> binaries? >>> >>> The CHECK and RESTRICT securebits are designed to make it possible a >>> "permissive mode" and an enforcement mode with the related locked >>> securebits. This is why this "CHECK=0, RESTRICT=1" combination looks a >>> bit weird. We can replace these securebits with others but I didn't >>> find a better (and simple) option. I don't think this is an issue >>> because with any security policy we can create unusable combinations. >>> The three other combinations makes a lot of sense though. >>> >> If we need only handle 3 combinations, I would think something like >> below is easier to understand, and don't have wield state like >> CHECK=0, RESTRICT=1 > > The "CHECK=0, RESTRICT=1" is useful for script interpreter instances > that should not interpret any command from users e.g., but only execute > script files. I see this case as being most relevant to something that doesn't usually need any custom scripts, but may have it. For example, macros in a document, or pre/post-install scripts for a package manager. For something whose sole purpose is to execute scripts, it doesn't make much sense. But there are other cases that can be reasonably controlled with this option. Cheers, Steve
On Wed, Jul 10, 2024 at 09:26:14AM -0700, Kees Cook wrote: > On Wed, Jul 10, 2024 at 11:58:25AM +0200, Mickaël Salaün wrote: > > Here is another proposal: > > > > We can change a bit the semantic by making it the norm to always check > > file executability with AT_CHECK, and using the securebits to restrict > > file interpretation and/or command injection (e.g. user supplied shell > > commands). Non-executable checked files can be reported/logged at the > > kernel level, with audit, configured by sysadmins. > > > > New securebits (feel free to propose better names): > > > > - SECBIT_EXEC_RESTRICT_FILE: requires AT_CHECK to pass. > > Would you want the enforcement of this bit done by userspace or the > kernel? > > IIUC, userspace would always perform AT_CHECK regardless of > SECBIT_EXEC_RESTRICT_FILE, and then which would happen? > > 1) userspace would ignore errors from AT_CHECK when > SECBIT_EXEC_RESTRICT_FILE is unset Yes, that's the idea. > > or > > 2) kernel would allow all AT_CHECK when SECBIT_EXEC_RESTRICT_FILE is > unset > > I suspect 1 is best and what you intend, given that > SECBIT_EXEC_DENY_INTERACTIVE can only be enforced by userspace. Indeed. We don't want AT_CHECK's behavior to change according to securebits. > > > - SECBIT_EXEC_DENY_INTERACTIVE: deny any command injection via > > command line arguments, environment variables, or configuration files. > > This should be ignored by dynamic linkers. We could also have an > > allow-list of shells for which this bit is not set, managed by an > > LSM's policy, if the native securebits scoping approach is not enough. > > > > Different modes for script interpreters: > > > > 1. RESTRICT_FILE=0 DENY_INTERACTIVE=0 (default) > > Always interpret scripts, and allow arbitrary user commands. > > => No threat, everyone and everything is trusted, but we can get > > ahead of potential issues with logs to prepare for a migration to a > > restrictive mode. > > > > 2. RESTRICT_FILE=1 DENY_INTERACTIVE=0 > > Deny script interpretation if they are not executable, and allow > > arbitrary user commands. > > => Threat: (potential) malicious scripts run by trusted (and not > > fooled) users. That could protect against unintended script > > executions (e.g. sh /tmp/*.sh). > > ==> Makes sense for (semi-restricted) user sessions. > > > > 3. RESTRICT_FILE=1 DENY_INTERACTIVE=1 > > Deny script interpretation if they are not executable, and also deny > > any arbitrary user commands. > > => Threat: malicious scripts run by untrusted users. > > ==> Makes sense for system services executing scripts. > > > > 4. RESTRICT_FILE=0 DENY_INTERACTIVE=1 > > Always interpret scripts, but deny arbitrary user commands. > > => Goal: monitor/measure/assess script content (e.g. with IMA/EVM) in > > a system where the access rights are not (yet) ready. Arbitrary > > user commands would be much more difficult to monitor. > > ==> First step of restricting system services that should not > > directly pass arbitrary commands to shells. > > I like these bits! Good! Jeff, Steve, Florian, Matt, others, what do you think?
On Thu, Jul 11, 2024 at 1:57 AM Mickaël Salaün <mic@digikod.net> wrote: > > On Wed, Jul 10, 2024 at 09:26:14AM -0700, Kees Cook wrote: > > On Wed, Jul 10, 2024 at 11:58:25AM +0200, Mickaël Salaün wrote: > > > Here is another proposal: > > > > > > We can change a bit the semantic by making it the norm to always check > > > file executability with AT_CHECK, and using the securebits to restrict > > > file interpretation and/or command injection (e.g. user supplied shell > > > commands). Non-executable checked files can be reported/logged at the > > > kernel level, with audit, configured by sysadmins. > > > > > > New securebits (feel free to propose better names): > > > > > > - SECBIT_EXEC_RESTRICT_FILE: requires AT_CHECK to pass. > > > > Would you want the enforcement of this bit done by userspace or the > > kernel? > > > > IIUC, userspace would always perform AT_CHECK regardless of > > SECBIT_EXEC_RESTRICT_FILE, and then which would happen? > > > > 1) userspace would ignore errors from AT_CHECK when > > SECBIT_EXEC_RESTRICT_FILE is unset > > Yes, that's the idea. > > > > > or > > > > 2) kernel would allow all AT_CHECK when SECBIT_EXEC_RESTRICT_FILE is > > unset > > > > I suspect 1 is best and what you intend, given that > > SECBIT_EXEC_DENY_INTERACTIVE can only be enforced by userspace. > > Indeed. We don't want AT_CHECK's behavior to change according to > securebits. > One bit is good. > > > > > - SECBIT_EXEC_DENY_INTERACTIVE: deny any command injection via > > > command line arguments, environment variables, or configuration files. > > > This should be ignored by dynamic linkers. We could also have an > > > allow-list of shells for which this bit is not set, managed by an > > > LSM's policy, if the native securebits scoping approach is not enough. > > > > > > Different modes for script interpreters: > > > > > > 1. RESTRICT_FILE=0 DENY_INTERACTIVE=0 (default) > > > Always interpret scripts, and allow arbitrary user commands. > > > => No threat, everyone and everything is trusted, but we can get > > > ahead of potential issues with logs to prepare for a migration to a > > > restrictive mode. > > > > > > 2. RESTRICT_FILE=1 DENY_INTERACTIVE=0 > > > Deny script interpretation if they are not executable, and allow > > > arbitrary user commands. > > > => Threat: (potential) malicious scripts run by trusted (and not > > > fooled) users. That could protect against unintended script > > > executions (e.g. sh /tmp/*.sh). > > > ==> Makes sense for (semi-restricted) user sessions. > > > > > > 3. RESTRICT_FILE=1 DENY_INTERACTIVE=1 > > > Deny script interpretation if they are not executable, and also deny > > > any arbitrary user commands. > > > => Threat: malicious scripts run by untrusted users. > > > ==> Makes sense for system services executing scripts. > > > > > > 4. RESTRICT_FILE=0 DENY_INTERACTIVE=1 > > > Always interpret scripts, but deny arbitrary user commands. > > > => Goal: monitor/measure/assess script content (e.g. with IMA/EVM) in > > > a system where the access rights are not (yet) ready. Arbitrary > > > user commands would be much more difficult to monitor. > > > ==> First step of restricting system services that should not > > > directly pass arbitrary commands to shells. > > > > I like these bits! > > Good! Jeff, Steve, Florian, Matt, others, what do you think? For below two cases: will they be restricted by one (or some) mode above ? 1> cat /tmp/a.sh | sh 2> sh -c "$(cat /tmp/a.sh)"
On 16/07/2024 16:02, Jeff Xu wrote: > For below two cases: will they be restricted by one (or some) mode above ? > > 1> cat /tmp/a.sh | sh > > 2> sh -c "$(cat /tmp/a.sh)" It will almost certainly depend on your context, but to properly lock down a system, they must be restricted. "We were unable to check the file" ought to be treated the same as "the file failed the check". If your goal is to only execute files that have been pre-approved in some manner, you're implying that you don't want interactive execution at all (since that is not a file that's been pre-approved). So a mere "sh" or "sh -c ..." would be restricted without checking anything other than the secure bit. Cheers, Steve
On Tue, Jul 16, 2024 at 08:02:37AM -0700, Jeff Xu wrote: > On Thu, Jul 11, 2024 at 1:57 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > On Wed, Jul 10, 2024 at 09:26:14AM -0700, Kees Cook wrote: > > > On Wed, Jul 10, 2024 at 11:58:25AM +0200, Mickaël Salaün wrote: > > > > Here is another proposal: > > > > > > > > We can change a bit the semantic by making it the norm to always check > > > > file executability with AT_CHECK, and using the securebits to restrict > > > > file interpretation and/or command injection (e.g. user supplied shell > > > > commands). Non-executable checked files can be reported/logged at the > > > > kernel level, with audit, configured by sysadmins. > > > > > > > > New securebits (feel free to propose better names): > > > > > > > > - SECBIT_EXEC_RESTRICT_FILE: requires AT_CHECK to pass. > > > > > > Would you want the enforcement of this bit done by userspace or the > > > kernel? > > > > > > IIUC, userspace would always perform AT_CHECK regardless of > > > SECBIT_EXEC_RESTRICT_FILE, and then which would happen? > > > > > > 1) userspace would ignore errors from AT_CHECK when > > > SECBIT_EXEC_RESTRICT_FILE is unset > > > > Yes, that's the idea. > > > > > > > > or > > > > > > 2) kernel would allow all AT_CHECK when SECBIT_EXEC_RESTRICT_FILE is > > > unset > > > > > > I suspect 1 is best and what you intend, given that > > > SECBIT_EXEC_DENY_INTERACTIVE can only be enforced by userspace. > > > > Indeed. We don't want AT_CHECK's behavior to change according to > > securebits. > > > One bit is good. > > > > > > > > - SECBIT_EXEC_DENY_INTERACTIVE: deny any command injection via > > > > command line arguments, environment variables, or configuration files. > > > > This should be ignored by dynamic linkers. We could also have an > > > > allow-list of shells for which this bit is not set, managed by an > > > > LSM's policy, if the native securebits scoping approach is not enough. > > > > > > > > Different modes for script interpreters: > > > > > > > > 1. RESTRICT_FILE=0 DENY_INTERACTIVE=0 (default) > > > > Always interpret scripts, and allow arbitrary user commands. > > > > => No threat, everyone and everything is trusted, but we can get > > > > ahead of potential issues with logs to prepare for a migration to a > > > > restrictive mode. > > > > > > > > 2. RESTRICT_FILE=1 DENY_INTERACTIVE=0 > > > > Deny script interpretation if they are not executable, and allow > > > > arbitrary user commands. > > > > => Threat: (potential) malicious scripts run by trusted (and not > > > > fooled) users. That could protect against unintended script > > > > executions (e.g. sh /tmp/*.sh). > > > > ==> Makes sense for (semi-restricted) user sessions. > > > > > > > > 3. RESTRICT_FILE=1 DENY_INTERACTIVE=1 > > > > Deny script interpretation if they are not executable, and also deny > > > > any arbitrary user commands. > > > > => Threat: malicious scripts run by untrusted users. > > > > ==> Makes sense for system services executing scripts. > > > > > > > > 4. RESTRICT_FILE=0 DENY_INTERACTIVE=1 > > > > Always interpret scripts, but deny arbitrary user commands. > > > > => Goal: monitor/measure/assess script content (e.g. with IMA/EVM) in > > > > a system where the access rights are not (yet) ready. Arbitrary > > > > user commands would be much more difficult to monitor. > > > > ==> First step of restricting system services that should not > > > > directly pass arbitrary commands to shells. > > > > > > I like these bits! > > > > Good! Jeff, Steve, Florian, Matt, others, what do you think? > > For below two cases: will they be restricted by one (or some) mode above ? > > 1> cat /tmp/a.sh | sh > > 2> sh -c "$(cat /tmp/a.sh)" Yes, DENY_INTERACTIVE=1 is to deny both of these cases (i.e. arbitrary user command). These other examples should be allowed with AT_CHECK and RESTRICT_FILE=1 if a.sh is executable though: * sh /tmp/a.sh * sh < /tmp/a.sh
On Tue, Jul 16, 2024 at 8:15 AM Mickaël Salaün <mic@digikod.net> wrote: > > On Tue, Jul 16, 2024 at 08:02:37AM -0700, Jeff Xu wrote: > > On Thu, Jul 11, 2024 at 1:57 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > > > On Wed, Jul 10, 2024 at 09:26:14AM -0700, Kees Cook wrote: > > > > On Wed, Jul 10, 2024 at 11:58:25AM +0200, Mickaël Salaün wrote: > > > > > Here is another proposal: > > > > > > > > > > We can change a bit the semantic by making it the norm to always check > > > > > file executability with AT_CHECK, and using the securebits to restrict > > > > > file interpretation and/or command injection (e.g. user supplied shell > > > > > commands). Non-executable checked files can be reported/logged at the > > > > > kernel level, with audit, configured by sysadmins. > > > > > > > > > > New securebits (feel free to propose better names): > > > > > > > > > > - SECBIT_EXEC_RESTRICT_FILE: requires AT_CHECK to pass. > > > > > > > > Would you want the enforcement of this bit done by userspace or the > > > > kernel? > > > > > > > > IIUC, userspace would always perform AT_CHECK regardless of > > > > SECBIT_EXEC_RESTRICT_FILE, and then which would happen? > > > > > > > > 1) userspace would ignore errors from AT_CHECK when > > > > SECBIT_EXEC_RESTRICT_FILE is unset > > > > > > Yes, that's the idea. > > > > > > > > > > > or > > > > > > > > 2) kernel would allow all AT_CHECK when SECBIT_EXEC_RESTRICT_FILE is > > > > unset > > > > > > > > I suspect 1 is best and what you intend, given that > > > > SECBIT_EXEC_DENY_INTERACTIVE can only be enforced by userspace. > > > > > > Indeed. We don't want AT_CHECK's behavior to change according to > > > securebits. > > > > > One bit is good. > > > > > > > > > > > - SECBIT_EXEC_DENY_INTERACTIVE: deny any command injection via > > > > > command line arguments, environment variables, or configuration files. > > > > > This should be ignored by dynamic linkers. We could also have an > > > > > allow-list of shells for which this bit is not set, managed by an > > > > > LSM's policy, if the native securebits scoping approach is not enough. > > > > > > > > > > Different modes for script interpreters: > > > > > > > > > > 1. RESTRICT_FILE=0 DENY_INTERACTIVE=0 (default) > > > > > Always interpret scripts, and allow arbitrary user commands. > > > > > => No threat, everyone and everything is trusted, but we can get > > > > > ahead of potential issues with logs to prepare for a migration to a > > > > > restrictive mode. > > > > > > > > > > 2. RESTRICT_FILE=1 DENY_INTERACTIVE=0 > > > > > Deny script interpretation if they are not executable, and allow > > > > > arbitrary user commands. > > > > > => Threat: (potential) malicious scripts run by trusted (and not > > > > > fooled) users. That could protect against unintended script > > > > > executions (e.g. sh /tmp/*.sh). > > > > > ==> Makes sense for (semi-restricted) user sessions. > > > > > > > > > > 3. RESTRICT_FILE=1 DENY_INTERACTIVE=1 > > > > > Deny script interpretation if they are not executable, and also deny > > > > > any arbitrary user commands. > > > > > => Threat: malicious scripts run by untrusted users. > > > > > ==> Makes sense for system services executing scripts. > > > > > > > > > > 4. RESTRICT_FILE=0 DENY_INTERACTIVE=1 > > > > > Always interpret scripts, but deny arbitrary user commands. > > > > > => Goal: monitor/measure/assess script content (e.g. with IMA/EVM) in > > > > > a system where the access rights are not (yet) ready. Arbitrary > > > > > user commands would be much more difficult to monitor. > > > > > ==> First step of restricting system services that should not > > > > > directly pass arbitrary commands to shells. > > > > > > > > I like these bits! > > > > > > Good! Jeff, Steve, Florian, Matt, others, what do you think? > > > > For below two cases: will they be restricted by one (or some) mode above ? > > > > 1> cat /tmp/a.sh | sh > > > > 2> sh -c "$(cat /tmp/a.sh)" > > Yes, DENY_INTERACTIVE=1 is to deny both of these cases (i.e. arbitrary > user command). > > These other examples should be allowed with AT_CHECK and RESTRICT_FILE=1 > if a.sh is executable though: > * sh /tmp/a.sh > * sh < /tmp/a.sh That looks good. Thanks for clarifying.
On Sat, 2024-07-06 at 16:56 +0200, Mickaël Salaün wrote: > On Fri, Jul 05, 2024 at 02:44:03PM -0700, Kees Cook wrote: > > On Fri, Jul 05, 2024 at 07:54:16PM +0200, Mickaël Salaün wrote: > > > On Thu, Jul 04, 2024 at 05:18:04PM -0700, Kees Cook wrote: > > > > On Thu, Jul 04, 2024 at 09:01:34PM +0200, Mickaël Salaün wrote: > > > > > Such a secure environment can be achieved with an appropriate access > > > > > control policy (e.g. mount's noexec option, file access rights, LSM > > > > > configuration) and an enlighten ld.so checking that libraries are > > > > > allowed for execution e.g., to protect against illegitimate use of > > > > > LD_PRELOAD. > > > > > > > > > > Scripts may need some changes to deal with untrusted data (e.g. stdin, > > > > > environment variables), but that is outside the scope of the kernel. > > > > > > > > If the threat model includes an attacker sitting at a shell prompt, we > > > > need to be very careful about how process perform enforcement. E.g. even > > > > on a locked down system, if an attacker has access to LD_PRELOAD or a > > > > > > LD_PRELOAD should be OK once ld.so will be patched to check the > > > libraries. We can still imagine a debug library used to bypass security > > > checks, but in this case the issue would be that this library is > > > executable in the first place. > > > > Ah yes, that's fair: the shell would discover the malicious library > > while using AT_CHECK during resolution of the LD_PRELOAD. > > That's the idea, but it would be checked by ld.so, not the shell. > > > > > > > seccomp wrapper (which you both mention here), it would be possible to > > > > run commands where the resulting process is tricked into thinking it > > > > doesn't have the bits set. > > > > > > As explained in the UAPI comments, all parent processes need to be > > > trusted. This meeans that their code is trusted, their seccomp filters > > > are trusted, and that they are patched, if needed, to check file > > > executability. > > > > But we have launchers that apply arbitrary seccomp policy, e.g. minijail > > on Chrome OS, or even systemd on regular distros. In theory, this should > > be handled via other ACLs. > > Processes running with untrusted seccomp filter should be considered > untrusted. It would then make sense for these seccomp filters/programs > to be considered executable code, and then for minijail and systemd to > check them with AT_CHECK (according to the securebits policy). > > > > > > > But this would be exactly true for calling execveat(): LD_PRELOAD or > > > > seccomp policy could have it just return 0. > > > > > > If an attacker is allowed/able to load an arbitrary seccomp filter on a > > > process, we cannot trust this process. > > > > > > > > > > > While I like AT_CHECK, I do wonder if it's better to do the checks via > > > > open(), as was originally designed with O_MAYEXEC. Because then > > > > enforcement is gated by the kernel -- the process does not get a file > > > > descriptor _at all_, no matter what LD_PRELOAD or seccomp tricks it into > > > > doing. > > > > > > Being able to check a path name or a file descriptor (with the same > > > syscall) is more flexible and cover more use cases. > > > > If flexibility costs us reliability, I think that flexibility is not > > a benefit. > > Well, it's a matter of letting user space do what they think is best, > and I think there are legitimate and safe uses of path names, even if I > agree that this should not be used in most use cases. Would we want > faccessat2(2) to only take file descriptor as argument and not file > path? I don't think so but I'd defer to the VFS maintainers. > > Christian, Al, Linus? > > Steve, could you share a use case with file paths? > > > > > > The execveat(2) > > > interface, including current and future flags, is dedicated to file > > > execution. I then think that using execveat(2) for this kind of check > > > makes more sense, and will easily evolve with this syscall. > > > > Yeah, I do recognize that is feels much more natural, but I remain > > unhappy about how difficult it will become to audit a system for safety > > when the check is strictly per-process opt-in, and not enforced by the > > kernel for a given process tree. But, I think this may have always been > > a fiction in my mind. :) > > Hmm, I'm not sure to follow. Securebits are inherited, so process tree. > And we need the parent processes to be trusted anyway. > > > > > > > And this thinking also applies to faccessat() too: if a process can be > > > > tricked into thinking the access check passed, it'll happily interpret > > > > whatever. :( But not being able to open the fd _at all_ when O_MAYEXEC > > > > is being checked seems substantially safer to me... > > > > > > If attackers can filter execveat(2), they can also filter open(2) and > > > any other syscalls. In all cases, that would mean an issue in the > > > security policy. > > > > Hm, as in, make a separate call to open(2) without O_MAYEXEC, and pass > > that fd back to the filtered open(2) that did have O_MAYEXEC. Yes, true. > > > > I guess it does become morally equivalent. > > > > Okay. Well, let me ask about usability. Right now, a process will need > > to do: > > > > - should I use AT_CHECK? (check secbit) > > - if yes: perform execveat(AT_CHECK) > > > > Why not leave the secbit test up to the kernel, and then the program can > > just unconditionally call execveat(AT_CHECK)? > > That was kind of the approach of the previous patch series and Linus > wanted the new interface to follow the kernel semantic. Enforcing this > kind of restriction will always be the duty of user space anyway, so I > think it's simpler (i.e. no mix of policy definition, access check, and > policy enforcement, but a standalone execveat feature), more flexible, > and it fully delegates the policy enforcement to user space instead of > trying to enforce some part in the kernel which would only give the > illusion of security/policy enforcement. A problem could be that from IMA perspective there is no indication on whether the interpreter executed or not execveat(). Sure, we can detect that the binary supports it, but if the enforcement was enabled/disabled that it is not recorded. Maybe, setting the process flags should be influenced by the kernel, for example not allowing changes and enforcing when there is an IMA policy loaded requiring to measure/appraise scripts. Roberto > > > > Though perhaps the issue here is that an execveat() EINVAL doesn't > > tell the program if AT_CHECK is unimplemented or if something else > > went wrong, and the secbit prctl() will give the correct signal about > > AT_CHECK availability? > > This kind of check could indeed help to identify the issue.
On Thu, Jul 18, 2024 at 04:16:45PM +0200, Roberto Sassu wrote: > On Sat, 2024-07-06 at 16:56 +0200, Mickaël Salaün wrote: > > On Fri, Jul 05, 2024 at 02:44:03PM -0700, Kees Cook wrote: > > > On Fri, Jul 05, 2024 at 07:54:16PM +0200, Mickaël Salaün wrote: > > > > On Thu, Jul 04, 2024 at 05:18:04PM -0700, Kees Cook wrote: > > > > > On Thu, Jul 04, 2024 at 09:01:34PM +0200, Mickaël Salaün wrote: > > > > > > Such a secure environment can be achieved with an appropriate access > > > > > > control policy (e.g. mount's noexec option, file access rights, LSM > > > > > > configuration) and an enlighten ld.so checking that libraries are > > > > > > allowed for execution e.g., to protect against illegitimate use of > > > > > > LD_PRELOAD. > > > > > > > > > > > > Scripts may need some changes to deal with untrusted data (e.g. stdin, > > > > > > environment variables), but that is outside the scope of the kernel. > > > > > > > > > > If the threat model includes an attacker sitting at a shell prompt, we > > > > > need to be very careful about how process perform enforcement. E.g. even > > > > > on a locked down system, if an attacker has access to LD_PRELOAD or a > > > > > > > > LD_PRELOAD should be OK once ld.so will be patched to check the > > > > libraries. We can still imagine a debug library used to bypass security > > > > checks, but in this case the issue would be that this library is > > > > executable in the first place. > > > > > > Ah yes, that's fair: the shell would discover the malicious library > > > while using AT_CHECK during resolution of the LD_PRELOAD. > > > > That's the idea, but it would be checked by ld.so, not the shell. > > > > > > > > > > seccomp wrapper (which you both mention here), it would be possible to > > > > > run commands where the resulting process is tricked into thinking it > > > > > doesn't have the bits set. > > > > > > > > As explained in the UAPI comments, all parent processes need to be > > > > trusted. This meeans that their code is trusted, their seccomp filters > > > > are trusted, and that they are patched, if needed, to check file > > > > executability. > > > > > > But we have launchers that apply arbitrary seccomp policy, e.g. minijail > > > on Chrome OS, or even systemd on regular distros. In theory, this should > > > be handled via other ACLs. > > > > Processes running with untrusted seccomp filter should be considered > > untrusted. It would then make sense for these seccomp filters/programs > > to be considered executable code, and then for minijail and systemd to > > check them with AT_CHECK (according to the securebits policy). > > > > > > > > > > But this would be exactly true for calling execveat(): LD_PRELOAD or > > > > > seccomp policy could have it just return 0. > > > > > > > > If an attacker is allowed/able to load an arbitrary seccomp filter on a > > > > process, we cannot trust this process. > > > > > > > > > > > > > > While I like AT_CHECK, I do wonder if it's better to do the checks via > > > > > open(), as was originally designed with O_MAYEXEC. Because then > > > > > enforcement is gated by the kernel -- the process does not get a file > > > > > descriptor _at all_, no matter what LD_PRELOAD or seccomp tricks it into > > > > > doing. > > > > > > > > Being able to check a path name or a file descriptor (with the same > > > > syscall) is more flexible and cover more use cases. > > > > > > If flexibility costs us reliability, I think that flexibility is not > > > a benefit. > > > > Well, it's a matter of letting user space do what they think is best, > > and I think there are legitimate and safe uses of path names, even if I > > agree that this should not be used in most use cases. Would we want > > faccessat2(2) to only take file descriptor as argument and not file > > path? I don't think so but I'd defer to the VFS maintainers. > > > > Christian, Al, Linus? > > > > Steve, could you share a use case with file paths? > > > > > > > > > The execveat(2) > > > > interface, including current and future flags, is dedicated to file > > > > execution. I then think that using execveat(2) for this kind of check > > > > makes more sense, and will easily evolve with this syscall. > > > > > > Yeah, I do recognize that is feels much more natural, but I remain > > > unhappy about how difficult it will become to audit a system for safety > > > when the check is strictly per-process opt-in, and not enforced by the > > > kernel for a given process tree. But, I think this may have always been > > > a fiction in my mind. :) > > > > Hmm, I'm not sure to follow. Securebits are inherited, so process tree. > > And we need the parent processes to be trusted anyway. > > > > > > > > > > And this thinking also applies to faccessat() too: if a process can be > > > > > tricked into thinking the access check passed, it'll happily interpret > > > > > whatever. :( But not being able to open the fd _at all_ when O_MAYEXEC > > > > > is being checked seems substantially safer to me... > > > > > > > > If attackers can filter execveat(2), they can also filter open(2) and > > > > any other syscalls. In all cases, that would mean an issue in the > > > > security policy. > > > > > > Hm, as in, make a separate call to open(2) without O_MAYEXEC, and pass > > > that fd back to the filtered open(2) that did have O_MAYEXEC. Yes, true. > > > > > > I guess it does become morally equivalent. > > > > > > Okay. Well, let me ask about usability. Right now, a process will need > > > to do: > > > > > > - should I use AT_CHECK? (check secbit) > > > - if yes: perform execveat(AT_CHECK) > > > > > > Why not leave the secbit test up to the kernel, and then the program can > > > just unconditionally call execveat(AT_CHECK)? > > > > That was kind of the approach of the previous patch series and Linus > > wanted the new interface to follow the kernel semantic. Enforcing this > > kind of restriction will always be the duty of user space anyway, so I > > think it's simpler (i.e. no mix of policy definition, access check, and > > policy enforcement, but a standalone execveat feature), more flexible, > > and it fully delegates the policy enforcement to user space instead of > > trying to enforce some part in the kernel which would only give the > > illusion of security/policy enforcement. > > A problem could be that from IMA perspective there is no indication on > whether the interpreter executed or not execveat(). Sure, we can detect > that the binary supports it, but if the enforcement was > enabled/disabled that it is not recorded. We should assume that if the interpreter call execveat+AT_CHECK, it will enforce restrictions according to its securebits. > > Maybe, setting the process flags should be influenced by the kernel, > for example not allowing changes and enforcing when there is an IMA > policy loaded requiring to measure/appraise scripts. LSMs can set the required securebits per task/interpreter according to their policies. > > Roberto > > > > > > > Though perhaps the issue here is that an execveat() EINVAL doesn't > > > tell the program if AT_CHECK is unimplemented or if something else > > > went wrong, and the secbit prctl() will give the correct signal about > > > AT_CHECK availability? > > > > This kind of check could indeed help to identify the issue. > >
On Fri, Jul 5, 2024 at 3:02 AM Mickaël Salaün <mic@digikod.net> wrote: > > These new SECBIT_SHOULD_EXEC_CHECK, SECBIT_SHOULD_EXEC_RESTRICT, and > their *_LOCKED counterparts are designed to be set by processes setting > up an execution environment, such as a user session, a container, or a > security sandbox. Like seccomp filters or Landlock domains, the > securebits are inherited across proceses. > > When SECBIT_SHOULD_EXEC_CHECK is set, programs interpreting code should > check executable resources with execveat(2) + AT_CHECK (see previous > patch). > > When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow > execution of approved resources, if any (see SECBIT_SHOULD_EXEC_CHECK). I read this twice, slept on it, read them again, and I *still* can't understand it. See below... > The only restriction enforced by the kernel is the right to ptrace > another process. Processes are denied to ptrace less restricted ones, > unless the tracer has CAP_SYS_PTRACE. This is mainly a safeguard to > avoid trivial privilege escalations e.g., by a debugging process being > abused with a confused deputy attack. What's the actual issue? And why can't I, as root, do, in a carefully checked, CHECK'd and RESTRICT'd environment, # gdb -p <pid>? Adding weird restrictions to ptrace can substantially *weaken* security because it forces people to do utterly daft things to work around the restrictions. ... > +/* > + * When SECBIT_SHOULD_EXEC_CHECK is set, a process should check all executable > + * files with execveat(2) + AT_CHECK. However, such check should only be > + * performed if all to-be-executed code only comes from regular files. For > + * instance, if a script interpreter is called with both a script snipped as s/snipped/snippet/ > + * argument and a regular file, the interpreter should not check any file. > + * Doing otherwise would mislead the kernel to think that only the script file > + * is being executed, which could for instance lead to unexpected permission > + * change and break current use cases. This is IMO not nearly clear enough to result in multiple user implementations and a kernel implementation and multiple LSM implementations and LSM policy authors actually agreeing as to what this means. I also think it's wrong to give user code instructions about what kernel checks it should do. Have the user code call the kernel and have the kernel implement the policy. > +/* > + * When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow > + * execution of approved files, if any (see SECBIT_SHOULD_EXEC_CHECK). For > + * instance, script interpreters called with a script snippet as argument > + * should always deny such execution if SECBIT_SHOULD_EXEC_RESTRICT is set. > + * However, if a script interpreter is called with both > + * SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT, they should > + * interpret the provided script files if no unchecked code is also provided > + * (e.g. directly as argument). I think you're trying to say that this is like (the inverse of) Content-Security-Policy: unsafe-inline. In other words, you're saying that, if RESTRICT is set, then programs should not execute code-like text that didn't come from a file. Is that right? I feel like it would be worth looking at the state of the art of Content-Security-Policy and all the lessons people have learned from it. Whatever the result is should be at least as comprehensible and at least as carefully engineered as Content-Security-Policy. --Andy
On Sat, Jul 20, 2024 at 10:06:28AM +0800, Andy Lutomirski wrote: > On Fri, Jul 5, 2024 at 3:02 AM Mickaël Salaün <mic@digikod.net> wrote: > > > > These new SECBIT_SHOULD_EXEC_CHECK, SECBIT_SHOULD_EXEC_RESTRICT, and > > their *_LOCKED counterparts are designed to be set by processes setting > > up an execution environment, such as a user session, a container, or a > > security sandbox. Like seccomp filters or Landlock domains, the > > securebits are inherited across proceses. > > > > When SECBIT_SHOULD_EXEC_CHECK is set, programs interpreting code should > > check executable resources with execveat(2) + AT_CHECK (see previous > > patch). > > > > When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow > > execution of approved resources, if any (see SECBIT_SHOULD_EXEC_CHECK). > > I read this twice, slept on it, read them again, and I *still* can't > understand it. See below... There is a new proposal: https://lore.kernel.org/all/20240710.eiKohpa4Phai@digikod.net/ The new securebits will be SECBIT_EXEC_RESTRICT_FILE and SECBIT_EXEC_DENY_INTERACTIVE. I'll send a new patch series with that. > > > The only restriction enforced by the kernel is the right to ptrace > > another process. Processes are denied to ptrace less restricted ones, > > unless the tracer has CAP_SYS_PTRACE. This is mainly a safeguard to > > avoid trivial privilege escalations e.g., by a debugging process being > > abused with a confused deputy attack. > > What's the actual issue? And why can't I, as root, do, in a carefully > checked, CHECK'd and RESTRICT'd environment, # gdb -p <pid>? Adding > weird restrictions to ptrace can substantially *weaken* security > because it forces people to do utterly daft things to work around the > restrictions. Restricting ptrace was a cautious approach, but I get you point and I agree. I'll remove the ptrace restrictions in the next patch series. > > ... > > > +/* > > + * When SECBIT_SHOULD_EXEC_CHECK is set, a process should check all executable > > + * files with execveat(2) + AT_CHECK. However, such check should only be > > + * performed if all to-be-executed code only comes from regular files. For > > + * instance, if a script interpreter is called with both a script snipped as > > s/snipped/snippet/ > > > + * argument and a regular file, the interpreter should not check any file. > > + * Doing otherwise would mislead the kernel to think that only the script file > > + * is being executed, which could for instance lead to unexpected permission > > + * change and break current use cases. > > This is IMO not nearly clear enough to result in multiple user > implementations and a kernel implementation and multiple LSM > implementations and LSM policy authors actually agreeing as to what > this means. Right, no kernel parts (e.g. LSMs) should try to infer anything other than an executability check. We should handle things such as role transitions with something else (e.g. a complementary dedicated flag), and that should be decorrelated from this patch series. > > I also think it's wrong to give user code instructions about what > kernel checks it should do. Have the user code call the kernel and > have the kernel implement the policy. Call the kernel for what? Script interpreter is a user space thing, and restrictions enforced on interpreters need to be a user space thing. The kernel cannot restrict user space according to a semantic only defined by user space, such as Python interpretation, CLI arguments, content of environment variables... If a process wants to interpret some data and turn than into code, there is no way for the kernel to know about that. > > > +/* > > + * When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow > > + * execution of approved files, if any (see SECBIT_SHOULD_EXEC_CHECK). For > > + * instance, script interpreters called with a script snippet as argument > > + * should always deny such execution if SECBIT_SHOULD_EXEC_RESTRICT is set. > > + * However, if a script interpreter is called with both > > + * SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT, they should > > + * interpret the provided script files if no unchecked code is also provided > > + * (e.g. directly as argument). > > I think you're trying to say that this is like (the inverse of) > Content-Security-Policy: unsafe-inline. In other words, you're saying > that, if RESTRICT is set, then programs should not execute code-like > text that didn't come from a file. Is that right? That is the definition of the new SECBIT_EXEC_DENY_INTERACTIVE, which should be clearer. > > I feel like it would be worth looking at the state of the art of > Content-Security-Policy and all the lessons people have learned from > it. Whatever the result is should be at least as comprehensible and > at least as carefully engineered as Content-Security-Policy. That's a good idea, but I guess Content-Security-Policy cannot be directly applied here. My understanding is that CSP enables web servers to request restrictions on code they provide. In the AT_CHECK+securebits case, the policy is defined and enforced by the interpreter, not necessarily the script provider. One big difference is that web servers (should) know the scripts they provide, and can then request the browser to ensure that they do what they should do, while the script interpreter trusts the kernel to check security properties of a script. In other words, something like CSP could be implemented with AT_CHECK+securebits and a LSM policy (e.g. according to file's xattr). > > --Andy
diff --git a/include/uapi/linux/securebits.h b/include/uapi/linux/securebits.h index d6d98877ff1a..3fdb0382718b 100644 --- a/include/uapi/linux/securebits.h +++ b/include/uapi/linux/securebits.h @@ -52,10 +52,64 @@ #define SECBIT_NO_CAP_AMBIENT_RAISE_LOCKED \ (issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE_LOCKED)) +/* + * When SECBIT_SHOULD_EXEC_CHECK is set, a process should check all executable + * files with execveat(2) + AT_CHECK. However, such check should only be + * performed if all to-be-executed code only comes from regular files. For + * instance, if a script interpreter is called with both a script snipped as + * argument and a regular file, the interpreter should not check any file. + * Doing otherwise would mislead the kernel to think that only the script file + * is being executed, which could for instance lead to unexpected permission + * change and break current use cases. + * + * This secure bit may be set by user session managers, service managers, + * container runtimes, sandboxer tools... Except for test environments, the + * related SECBIT_SHOULD_EXEC_CHECK_LOCKED bit should also be set. + * + * Ptracing another process is deny if the tracer has SECBIT_SHOULD_EXEC_CHECK + * but not the tracee. SECBIT_SHOULD_EXEC_CHECK_LOCKED also checked. + */ +#define SECURE_SHOULD_EXEC_CHECK 8 +#define SECURE_SHOULD_EXEC_CHECK_LOCKED 9 /* make bit-8 immutable */ + +#define SECBIT_SHOULD_EXEC_CHECK (issecure_mask(SECURE_SHOULD_EXEC_CHECK)) +#define SECBIT_SHOULD_EXEC_CHECK_LOCKED \ + (issecure_mask(SECURE_SHOULD_EXEC_CHECK_LOCKED)) + +/* + * When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow + * execution of approved files, if any (see SECBIT_SHOULD_EXEC_CHECK). For + * instance, script interpreters called with a script snippet as argument + * should always deny such execution if SECBIT_SHOULD_EXEC_RESTRICT is set. + * However, if a script interpreter is called with both + * SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT, they should + * interpret the provided script files if no unchecked code is also provided + * (e.g. directly as argument). + * + * This secure bit may be set by user session managers, service managers, + * container runtimes, sandboxer tools... Except for test environments, the + * related SECBIT_SHOULD_EXEC_RESTRICT_LOCKED bit should also be set. + * + * Ptracing another process is deny if the tracer has + * SECBIT_SHOULD_EXEC_RESTRICT but not the tracee. + * SECBIT_SHOULD_EXEC_RESTRICT_LOCKED is also checked. + */ +#define SECURE_SHOULD_EXEC_RESTRICT 10 +#define SECURE_SHOULD_EXEC_RESTRICT_LOCKED 11 /* make bit-8 immutable */ + +#define SECBIT_SHOULD_EXEC_RESTRICT (issecure_mask(SECURE_SHOULD_EXEC_RESTRICT)) +#define SECBIT_SHOULD_EXEC_RESTRICT_LOCKED \ + (issecure_mask(SECURE_SHOULD_EXEC_RESTRICT_LOCKED)) + #define SECURE_ALL_BITS (issecure_mask(SECURE_NOROOT) | \ issecure_mask(SECURE_NO_SETUID_FIXUP) | \ issecure_mask(SECURE_KEEP_CAPS) | \ - issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE)) + issecure_mask(SECURE_NO_CAP_AMBIENT_RAISE) | \ + issecure_mask(SECURE_SHOULD_EXEC_CHECK) | \ + issecure_mask(SECURE_SHOULD_EXEC_RESTRICT)) #define SECURE_ALL_LOCKS (SECURE_ALL_BITS << 1) +#define SECURE_ALL_UNPRIVILEGED (issecure_mask(SECURE_SHOULD_EXEC_CHECK) | \ + issecure_mask(SECURE_SHOULD_EXEC_RESTRICT)) + #endif /* _UAPI_LINUX_SECUREBITS_H */ diff --git a/security/commoncap.c b/security/commoncap.c index 162d96b3a676..34b4493e2a69 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -117,6 +117,33 @@ int cap_settime(const struct timespec64 *ts, const struct timezone *tz) return 0; } +static bool ptrace_secbits_allowed(const struct cred *tracer, + const struct cred *tracee) +{ + const unsigned long tracer_secbits = SECURE_ALL_UNPRIVILEGED & + tracer->securebits; + const unsigned long tracee_secbits = SECURE_ALL_UNPRIVILEGED & + tracee->securebits; + /* Ignores locking of unset secure bits (cf. SECURE_ALL_LOCKS). */ + const unsigned long tracer_locked = (tracer_secbits << 1) & + tracer->securebits; + const unsigned long tracee_locked = (tracee_secbits << 1) & + tracee->securebits; + + /* The tracee must not have less constraints than the tracer. */ + if ((tracer_secbits | tracee_secbits) != tracee_secbits) + return false; + + /* + * Makes sure that the tracer's locks for restrictions are the same for + * the tracee. + */ + if ((tracer_locked | tracee_locked) != tracee_locked) + return false; + + return true; +} + /** * cap_ptrace_access_check - Determine whether the current process may access * another @@ -146,7 +173,8 @@ int cap_ptrace_access_check(struct task_struct *child, unsigned int mode) else caller_caps = &cred->cap_permitted; if (cred->user_ns == child_cred->user_ns && - cap_issubset(child_cred->cap_permitted, *caller_caps)) + cap_issubset(child_cred->cap_permitted, *caller_caps) && + ptrace_secbits_allowed(cred, child_cred)) goto out; if (ns_capable(child_cred->user_ns, CAP_SYS_PTRACE)) goto out; @@ -178,7 +206,8 @@ int cap_ptrace_traceme(struct task_struct *parent) cred = __task_cred(parent); child_cred = current_cred(); if (cred->user_ns == child_cred->user_ns && - cap_issubset(child_cred->cap_permitted, cred->cap_permitted)) + cap_issubset(child_cred->cap_permitted, cred->cap_permitted) && + ptrace_secbits_allowed(cred, child_cred)) goto out; if (has_ns_capability(parent, child_cred->user_ns, CAP_SYS_PTRACE)) goto out; @@ -1302,21 +1331,39 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3, & (old->securebits ^ arg2)) /*[1]*/ || ((old->securebits & SECURE_ALL_LOCKS & ~arg2)) /*[2]*/ || (arg2 & ~(SECURE_ALL_LOCKS | SECURE_ALL_BITS)) /*[3]*/ - || (cap_capable(current_cred(), - current_cred()->user_ns, - CAP_SETPCAP, - CAP_OPT_NONE) != 0) /*[4]*/ /* * [1] no changing of bits that are locked * [2] no unlocking of locks * [3] no setting of unsupported bits - * [4] doing anything requires privilege (go read about - * the "sendmail capabilities bug") */ ) /* cannot change a locked bit */ return -EPERM; + /* + * Doing anything requires privilege (go read about the + * "sendmail capabilities bug"), except for unprivileged bits. + * Indeed, the SECURE_ALL_UNPRIVILEGED bits are not + * restrictions enforced by the kernel but by user space on + * itself. The kernel is only in charge of protecting against + * privilege escalation with ptrace protections. + */ + if (cap_capable(current_cred(), current_cred()->user_ns, + CAP_SETPCAP, CAP_OPT_NONE) != 0) { + const unsigned long unpriv_and_locks = + SECURE_ALL_UNPRIVILEGED | + SECURE_ALL_UNPRIVILEGED << 1; + const unsigned long changed = old->securebits ^ arg2; + + /* For legacy reason, denies non-change. */ + if (!changed) + return -EPERM; + + /* Denies privileged changes. */ + if (changed & ~unpriv_and_locks) + return -EPERM; + } + new = prepare_creds(); if (!new) return -ENOMEM;
These new SECBIT_SHOULD_EXEC_CHECK, SECBIT_SHOULD_EXEC_RESTRICT, and their *_LOCKED counterparts are designed to be set by processes setting up an execution environment, such as a user session, a container, or a security sandbox. Like seccomp filters or Landlock domains, the securebits are inherited across proceses. When SECBIT_SHOULD_EXEC_CHECK is set, programs interpreting code should check executable resources with execveat(2) + AT_CHECK (see previous patch). When SECBIT_SHOULD_EXEC_RESTRICT is set, a process should only allow execution of approved resources, if any (see SECBIT_SHOULD_EXEC_CHECK). For a secure environment, we might also want SECBIT_SHOULD_EXEC_CHECK_LOCKED and SECBIT_SHOULD_EXEC_RESTRICT_LOCKED to be set. For a test environment (e.g. testing on a fleet to identify potential issues), only the SECBIT_SHOULD_EXEC_CHECK* bits can be set to still be able to identify potential issues (e.g. with interpreters logs or LSMs audit entries). It should be noted that unlike other security bits, the SECBIT_SHOULD_EXEC_CHECK and SECBIT_SHOULD_EXEC_RESTRICT bits are dedicated to user space willing to restrict itself. Because of that, they only make sense in the context of a trusted environment (e.g. sandbox, container, user session, full system) where the process changing its behavior (according to these bits) and all its parent processes are trusted. Otherwise, any parent process could just execute its own malicious code (interpreting a script or not), or even enforce a seccomp filter to mask these bits. Such a secure environment can be achieved with an appropriate access control policy (e.g. mount's noexec option, file access rights, LSM configuration) and an enlighten ld.so checking that libraries are allowed for execution e.g., to protect against illegitimate use of LD_PRELOAD. Scripts may need some changes to deal with untrusted data (e.g. stdin, environment variables), but that is outside the scope of the kernel. The only restriction enforced by the kernel is the right to ptrace another process. Processes are denied to ptrace less restricted ones, unless the tracer has CAP_SYS_PTRACE. This is mainly a safeguard to avoid trivial privilege escalations e.g., by a debugging process being abused with a confused deputy attack. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Paul Moore <paul@paul-moore.com> Signed-off-by: Mickaël Salaün <mic@digikod.net> Link: https://lore.kernel.org/r/20240704190137.696169-3-mic@digikod.net --- New design since v18: https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net --- include/uapi/linux/securebits.h | 56 ++++++++++++++++++++++++++++- security/commoncap.c | 63 ++++++++++++++++++++++++++++----- 2 files changed, 110 insertions(+), 9 deletions(-)