Message ID | 20220321161557.495388-1-mic@digikod.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [GIT,PULL] Add trusted_for(2) (was O_MAYEXEC) | expand |
On Mon, Mar 21, 2022 at 05:15:57PM +0100, Mickaël Salaün wrote: > Since I heard no objection, please consider to pull this code for > v5.18-rc1 . These five patches have been successfully tested in the > latest linux-next releases for several weeks. > kernel/sysctl.c | 9 + Please don't add more sysctls there. We're slowly trying to move all these to their respective places so this does not become a larger kitchen sink mess. Luis
On 21/03/2022 18:38, Luis Chamberlain wrote: > On Mon, Mar 21, 2022 at 05:15:57PM +0100, Mickaël Salaün wrote: >> Since I heard no objection, please consider to pull this code for >> v5.18-rc1 . These five patches have been successfully tested in the >> latest linux-next releases for several weeks. > >> kernel/sysctl.c | 9 + > > Please don't add more sysctls there. We're slowly trying to move > all these to their respective places so this does not become a larger > kitchen sink mess. It is not a new sysctl but proc_dointvec_minmax_sysadmin(). This helper is shared between printk and fs subsystems.
On Mon, Mar 21, 2022 at 07:05:42PM +0100, Mickaël Salaün wrote: > > On 21/03/2022 18:38, Luis Chamberlain wrote: > > On Mon, Mar 21, 2022 at 05:15:57PM +0100, Mickaël Salaün wrote: > > > Since I heard no objection, please consider to pull this code for > > > v5.18-rc1 . These five patches have been successfully tested in the > > > latest linux-next releases for several weeks. > > > > > kernel/sysctl.c | 9 + > > > > Please don't add more sysctls there. We're slowly trying to move > > all these to their respective places so this does not become a larger > > kitchen sink mess. > > It is not a new sysctl but proc_dointvec_minmax_sysadmin(). This helper is > shared between printk and fs subsystems. That should be good then, thanks! Luis
Hi, What is the status of this pull request? Do you need something more? Regards, Mickaël On 21/03/2022 17:15, Mickaël Salaün wrote: > Hi Linus, > > This patch series adds a new syscall named trusted_for. It enables user > space to ask the kernel: is this file descriptor's content trusted to be > used for this purpose? The set of usage currently only contains > execution, but other may follow (e.g. configuration, sensitive data). > If the kernel identifies the file descriptor as trustworthy for this > usage, user space should then take this information into account. The > "execution" usage means that the content of the file descriptor is > trusted according to the system policy to be executed by user space, > which means that it interprets the content or (try to) maps it as > executable memory. > > A simple system-wide security policy can be set by the system > administrator through a sysctl configuration consistent with the mount > points or the file access rights. The documentation explains the > prerequisites. > > It is important to note that this can only enable to extend access > control managed by the kernel. Hence it enables current access control > mechanism to be extended and become a superset of what they can > currently control. Indeed, the security policy could also be delegated > to an LSM, either a MAC system or an integrity system. For instance, > this is required to close a major IMA measurement/appraisal interpreter > integrity gap by bringing the ability to check the use of scripts. > Other uses are expected as well. > > For further details, please see the latest cover letter: > https://lore.kernel.org/r/20220104155024.48023-1-mic@digikod.net > > Commit dae71698b6c5 ("printk: Move back proc_dointvec_minmax_sysadmin() > to sysctl.c") was recently added due to the sysctl refactoring. > > Commit e674341a90b9 ("selftests/interpreter: fix separate directory > build") will fix some test build cases as explained here: > https://lore.kernel.org/r/20220119101531.2850400-1-usama.anjum@collabora.com > Merging this commit without the new KHDR_INCLUDES is not an issue. > The upcoming kselftest pull request is ready: > https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git/log/?h=next > > This patch series has been open for review for more than three years and > got a lot of feedbacks (and bikeshedding) which were all considered. > Since I heard no objection, please consider to pull this code for > v5.18-rc1 . These five patches have been successfully tested in the > latest linux-next releases for several weeks. > > Regards, > Mickaël > > -- > The following changes since commit dcb85f85fa6f142aae1fe86f399d4503d49f2b60: > > gcc-plugins/stackleak: Use noinstr in favor of notrace (2022-02-03 17:02:21 -0800) > > are available in the Git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux.git tags/trusted-for-v18 > > for you to fetch changes up to e674341a90b95c3458d684ae25e6891afc3e03ad: > > selftests/interpreter: fix separate directory build (2022-03-04 10:56:25 +0100) > > ---------------------------------------------------------------- > Add the trusted_for system call (v18) > > The final goal of this patch series is to enable the kernel to be a > global policy manager by entrusting processes with access control at > their level. To reach this goal, two complementary parts are required: > * user space needs to be able to know if it can trust some file > descriptor content for a specific usage; > * and the kernel needs to make available some part of the policy > configured by the system administrator. > > In a nutshell, this is a required building block to control script > execution. > > For further details see the latest cover letter: > https://lore.kernel.org/r/20220104155024.48023-1-mic@digikod.net > > ---------------------------------------------------------------- > Mickaël Salaün (4): > printk: Move back proc_dointvec_minmax_sysadmin() to sysctl.c > fs: Add trusted_for(2) syscall implementation and related sysctl > arch: Wire up trusted_for(2) > selftest/interpreter: Add tests for trusted_for(2) policies > > Muhammad Usama Anjum (1): > selftests/interpreter: fix separate directory build > > Documentation/admin-guide/sysctl/fs.rst | 50 +++ > arch/alpha/kernel/syscalls/syscall.tbl | 1 + > arch/arm/tools/syscall.tbl | 1 + > arch/arm64/include/asm/unistd.h | 2 +- > arch/arm64/include/asm/unistd32.h | 2 + > arch/ia64/kernel/syscalls/syscall.tbl | 1 + > arch/m68k/kernel/syscalls/syscall.tbl | 1 + > arch/microblaze/kernel/syscalls/syscall.tbl | 1 + > arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + > arch/mips/kernel/syscalls/syscall_n64.tbl | 1 + > arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + > arch/parisc/kernel/syscalls/syscall.tbl | 1 + > arch/powerpc/kernel/syscalls/syscall.tbl | 1 + > arch/s390/kernel/syscalls/syscall.tbl | 1 + > arch/sh/kernel/syscalls/syscall.tbl | 1 + > arch/sparc/kernel/syscalls/syscall.tbl | 1 + > arch/x86/entry/syscalls/syscall_32.tbl | 1 + > arch/x86/entry/syscalls/syscall_64.tbl | 1 + > arch/xtensa/kernel/syscalls/syscall.tbl | 1 + > fs/open.c | 133 ++++++++ > fs/proc/proc_sysctl.c | 2 +- > include/linux/syscalls.h | 1 + > include/linux/sysctl.h | 3 + > include/uapi/asm-generic/unistd.h | 5 +- > include/uapi/linux/trusted-for.h | 18 + > kernel/printk/sysctl.c | 9 - > kernel/sysctl.c | 9 + > tools/testing/selftests/Makefile | 1 + > tools/testing/selftests/interpreter/.gitignore | 2 + > tools/testing/selftests/interpreter/Makefile | 21 ++ > tools/testing/selftests/interpreter/config | 1 + > .../selftests/interpreter/trust_policy_test.c | 362 +++++++++++++++++++++ > 32 files changed, 625 insertions(+), 12 deletions(-) > create mode 100644 include/uapi/linux/trusted-for.h > create mode 100644 tools/testing/selftests/interpreter/.gitignore > create mode 100644 tools/testing/selftests/interpreter/Makefile > create mode 100644 tools/testing/selftests/interpreter/config > create mode 100644 tools/testing/selftests/interpreter/trust_policy_test.c
On Mon, Mar 21, 2022 at 05:15:57PM +0100, Mickaël Salaün wrote: > [...] > For further details, please see the latest cover letter: > https://lore.kernel.org/r/20220104155024.48023-1-mic@digikod.net > > Commit dae71698b6c5 ("printk: Move back proc_dointvec_minmax_sysadmin() > to sysctl.c") was recently added due to the sysctl refactoring. > > Commit e674341a90b9 ("selftests/interpreter: fix separate directory > build") will fix some test build cases as explained here: > https://lore.kernel.org/r/20220119101531.2850400-1-usama.anjum@collabora.com > Merging this commit without the new KHDR_INCLUDES is not an issue. > The upcoming kselftest pull request is ready: > https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git/log/?h=next > > This patch series has been open for review for more than three years and > got a lot of feedbacks (and bikeshedding) which were all considered. > Since I heard no objection, please consider to pull this code for > v5.18-rc1 . These five patches have been successfully tested in the > latest linux-next releases for several weeks. Hi Linus, It looks like this didn't get pulled for -rc1 even though it was sent during the merge window and has been in -next for a while. It would be really nice to get this landed since userspace can't make any forward progress without the kernel support. Was there some issue blocking this from being merged? All the feedback I can find on prior versions was addressed. -Kees
On Mon, Apr 4, 2022 at 11:40 AM Kees Cook <keescook@chromium.org> wrote: > > It looks like this didn't get pulled for -rc1 even though it was sent > during the merge window and has been in -next for a while. It would be > really nice to get this landed since userspace can't make any forward > progress without the kernel support. Honestly, I need a *lot* better reasoning for random new non-standard system calls than this had. And this kind of "completely random interface with no semantics except for random 'future flags'" I will not pull even *with* good reasoning. I already told Mickaël in private that I wouldn't pull this. Honestly, we have a *horrible* history with non-standard system calls, and that's been true even for well-designed stuff that actually matters, that people asked for. Something like this, which adds one very special system call and where the whole thing is designed for "let's add something random later because we don't even know what we want" is right out. What the system call seems to actually *want* is basically a new flag to access() (and faccessat()). One that is very close to what X_OK already is. But that wasn't how it was sold. So no. No way will this ever get merged, and whoever came up with that disgusting "trusted_for()" (for WHAT? WHO TRUSTS? WHY?) should look themselves in the mirror. If you add a new X_OK variant to access(), maybe that could fly. Linus
On 04/04/2022 20:47, Linus Torvalds wrote: > On Mon, Apr 4, 2022 at 11:40 AM Kees Cook <keescook@chromium.org> wrote: >> >> It looks like this didn't get pulled for -rc1 even though it was sent >> during the merge window and has been in -next for a while. It would be >> really nice to get this landed since userspace can't make any forward >> progress without the kernel support. > > Honestly, I need a *lot* better reasoning for random new non-standard > system calls than this had. > > And this kind of "completely random interface with no semantics except > for random 'future flags'" I will not pull even *with* good reasoning. I think the semantic is well defined: "This new syscall enables user space to ask the kernel: is this file descriptor's content trusted to be used for this purpose?" See the trusted_for_policy sysctl documentation: https://lore.kernel.org/all/20220104155024.48023-3-mic@digikod.net/ There is currently only one defined and implemented purpose: execution (or script interpretation). There is room for other flags because it is a good practice to do so, and other purposes were proposed. > > I already told Mickaël in private that I wouldn't pull this. > > Honestly, we have a *horrible* history with non-standard system calls, > and that's been true even for well-designed stuff that actually > matters, that people asked for. > > Something like this, which adds one very special system call and > where the whole thing is designed for "let's add something random > later because we don't even know what we want" is right out. > > What the system call seems to actually *want* is basically a new flag > to access() (and faccessat()). One that is very close to what X_OK > already is. I agree. > > But that wasn't how it was sold. > > So no. No way will this ever get merged, and whoever came up with that > disgusting "trusted_for()" (for WHAT? WHO TRUSTS? WHY?) should look > themselves in the mirror. Well, naming is difficult, but I'm open to suggestion. :) As explained in the description, the WHAT is the file descriptor content, the WHO TRUSTS is the system security policy (e.g. the mount point options) and the WHY is defined by the usage flag (TRUSTED_FOR_EXECUTION). This translates to: is this file descriptor's content trusted to be used for this specified purpose/usage? > > If you add a new X_OK variant to access(), maybe that could fly. As answered in private, that was the approach I took for one of the early versions but a dedicated syscall was requested by Al Viro: https://lore.kernel.org/r/2ed377c4-3500-3ddc-7181-a5bc114ddf94@digikod.net The main reason behind this request was that it doesn't have the exact same semantic as faccessat(2). The changes for this syscall are documented here: https://lore.kernel.org/all/20220104155024.48023-3-mic@digikod.net/ The whole history is linked in the cover letter: https://lore.kernel.org/all/2ed377c4-3500-3ddc-7181-a5bc114ddf94@digikod.net/ This initial proposal was using a new faccessat2(2) flag: AT_INTERPRETED, see https://lore.kernel.org/all/20200908075956.1069018-2-mic@digikod.net/ What do you think about that? I'm happy to get back to this version if everyone is OK with it.
On Mon, Apr 4, 2022 at 1:29 PM Mickaël Salaün <mic@digikod.net> wrote: > > This initial proposal was using a new faccessat2(2) flag: > AT_INTERPRETED, see > https://lore.kernel.org/all/20200908075956.1069018-2-mic@digikod.net/ > What do you think about that? I'm happy to get back to this version if > everyone is OK with it. I'm certainly happi_er_ with that, but I find that particular patch odd for other reasons. In no particular order: - what's with the insane non-C syntax? Double parentheses have no actual meaning in C: if ((flags & AT_INTERPRETED)) { if ((mode & MAY_EXEC)) { so I don't understand why you'd use that strance thing. - why is this an AT_INTERPRETED flag? I don't understand the name, I don't understand the semantics. Why would that flag have the same value as AT_SYMLINK_FOLLOW? Why isn't this just a new _mode_ bit, which is what I feel is sensible? We only use three bits (with no bits set meaning "existence"), so we have *tons* of bits left in that namespace, and it would make much more sense to me to have #define EXECVE_OK 8 which is the same as the "group exec" bit, so it actually makes some kind of sense too. - related to that "I don't understand the semantics", the "documentation" for that thing doesn't make sense either: + The + main usage is for script + interpreters to enforce a policy + consistent with the kernel's one + (through sysctl configuration or LSM + policy). */ Now, what I *think* you mean is (1) user-space executable loaders want to be able to test the *same* policy as the kernel does for execve() (2) access(path, EXECVE_OK) will do the same permission checks as "execve()" would do for that path (3) if you already have the fd open, use "faccess(fd, NULL, F_OK_TO_EXECUTE, AT_EMPTY_PATH)" (4) maybe we want to add a flag for the "euid vs real uid", and that would be in the "flags" field, since that changes the actual *lookup* semantics Note that that (4) is something that some normal user space has wanted in the past too (GNU libcs has a "eaccess()" thing for "effective uid access"). And yes, we still need to talk details: - no backwards compatibility issues, because we've happily always checked 'mode' for being valid, so old kernels will always return -EINVAL. - POSIX namespace issues for EXECVE_OK means that the name probably needs some thinking (maybe we just need to call it __ACCESS_OK_EXECVE internally or something - the kernel actually doesn't even export the existing [FRWX]_OK values, because we "know" they map tho the usual "owner RWX" bits, with F being "no bit set") - I really want the exact semantics very clearly defined. I think it's ok to say "exact same security check as for 'execve()'", but even then we need to have that discussion about (a) "what about suid bits that user space cannot react to" (b) that whole "effective vs real" discussion So to recap: I'm very much ok with some access() extension, but I think even that very much needs clarification, and the existing patch is just odd in many many ways. Linus
On Mon, Apr 4, 2022 at 2:28 PM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > (4) maybe we want to add a flag for the "euid vs real uid", and that > would be in the "flags" field, since that changes the actual *lookup* > semantics Duh. We already did that long ago, and it's there as AT_EACCESS. I should have just looked at the code more closely. But that "you didn't even check what we already do, Linus" thing just makes it even more obvious that all of this makes perfect sense in the confines of access() already, and a new "check _these_ protections" should just be a new mode bit. Linus
On Mon, Apr 04, 2022 at 02:28:19PM -0700, Linus Torvalds wrote: > Now, what I *think* you mean is > > (1) user-space executable loaders want to be able to test the *same* > policy as the kernel does for execve() Right. The script interpreter wants to ask "if this file were actually an ELF going through execve(), would the kernel allow it?" > (2) access(path, EXECVE_OK) will do the same permission checks as > "execve()" would do for that path Maybe. I defer to Mickaël here, but my instinct is to avoid creating an API that can be accidentally misused. I'd like this to be fd-only based, since that removes path name races. (e.g. trusted_for() required an fd.) > (3) if you already have the fd open, use "faccess(fd, NULL, > F_OK_TO_EXECUTE, AT_EMPTY_PATH)" Yes, specifically faccessat2(). (And continuing the race thought above, yes, there could still be races if the content of the file could be changed, but that case is less problematic under real-world conditions.) > (4) maybe we want to add a flag for the "euid vs real uid", and that > would be in the "flags" field, since that changes the actual *lookup* > semantics > > Note that that (4) is something that some normal user space has wanted > in the past too (GNU libcs has a "eaccess()" thing for "effective uid > access"). I think this already exists as AT_EACCESS? It was added with faccessat2() itself, if I'm reading the history correctly. And I just need to say that the thought of setuid script interpreters still makes me sad. :) > - I really want the exact semantics very clearly defined. I think > it's ok to say "exact same security check as for 'execve()'", but even > then we need to have that discussion about > > (a) "what about suid bits that user space cannot react to" What do you mean here? Do you mean setid bits on the file itself? > (b) that whole "effective vs real" discussion I think this is handled with AT_EACCESS?
On Mon, Apr 4, 2022 at 3:25 PM Kees Cook <keescook@chromium.org> wrote: > > Maybe. I defer to Mickaël here, but my instinct is to avoid creating an > API that can be accidentally misused. I'd like this to be fd-only based, > since that removes path name races. (e.g. trusted_for() required an fd.) Some people want pathnames. Think things like just the PATH thing just to find the right executable in the first place. For things like that, races don't matter, because you're just trying to find the right path to begin with. > I think this already exists as AT_EACCESS? It was added with > faccessat2() itself, if I'm reading the history correctly. Yeah, I noticed myself, I just hadn't looked (and I don't do enough user-space programming to be aware of if that way). > > (a) "what about suid bits that user space cannot react to" > > What do you mean here? Do you mean setid bits on the file itself? Right. Maybe we don't care. Maybe we do. Is the user-space loader going to honor them? Is it going to ignore them? I don't know. And it actually interacts with things like 'nosuid', which the kernel does know about, and user space has a hard time figuring out. So if the point is "give me an interface so that I can do the same thing a kernel execve() loader would do", then those sgid/suid bits actually may be exactly the kind of thing that user space wants the kernel to react to - should it ignore them, or should it do something special when it sees that they are set? I'm not saying that they *should* be something we care about. All I'm saying is that I want that *discussion* to happen. Linus
On Mon, Apr 04, 2022 at 10:30:13PM +0200, Mickaël Salaün wrote: > > If you add a new X_OK variant to access(), maybe that could fly. > > As answered in private, that was the approach I took for one of the early > versions but a dedicated syscall was requested by Al Viro: > https://lore.kernel.org/r/2ed377c4-3500-3ddc-7181-a5bc114ddf94@digikod.net > The main reason behind this request was that it doesn't have the exact same > semantic as faccessat(2). The changes for this syscall are documented here: > https://lore.kernel.org/all/20220104155024.48023-3-mic@digikod.net/ > The whole history is linked in the cover letter: > https://lore.kernel.org/all/2ed377c4-3500-3ddc-7181-a5bc114ddf94@digikod.net/ As a suggestion, something that can be helpful for something which has been as heavily bike-sheded as this concept might be to write a "legislative history", or perhaps, a "bike shed history". And not just with links to mailing list discussions, but a short summary of why, for example, we moved from the open flag O_MAYEXEC to the faccessat(2) approach. I looked, but I couldn't find the reasoning while diving into the mail archives. And there was some kind of request for some new functionality for IMA and other LSM's that I couldn't follow that is why the new AT_INTERETED flag, but at this point my time quantuum for mailing list archeology most definitely expired. :-) It might be that when all of this is laid out, we can either revisit prior design decisions as "that bike-shed request to support this corner case was unreasonable", or "oh, OK, this is why we need as fully general a solution as this". Also, some of examples of potential future use cases such as "magic links" that were linked in the cover letter, it's not clear to me actually make sense in the context of a "trusted for" system call (although might make more sense in the context of an open flag). So revisiting some of those other cases to see whether they actually *could* be implemented as new "TRUSTED_FOR" flags might be instructive. Personally, I'm a bit skeptical about the prospct of additional use cases, since trusted_for(2) is essentially a mother_should_I(2) request where userspace is asking the kernel whether they should go ahead and do some particular policy thing. And it's not clear to me how many of these policy questions exist where (a) the kernel is in the past position to answer that question, and (b) there isn't some additional information that the kernel doesn't have that might be needed to answer that question. For example, "Mother should I use that private key file" might require information about whether the SRE is currently on pager duty or not, at least for some policies, and the kernel isn't going to have that information. Other examples of TRUSTED_FOR flags that really make sense and would be useful might help alleviate my skepticsm. And the "bike shed history" would help with my question about why some folks didn't like the original O_MAYEXEC flag? Cheers, - Ted
On 04/04/2022 23:28, Linus Torvalds wrote: > On Mon, Apr 4, 2022 at 1:29 PM Mickaël Salaün <mic@digikod.net> wrote: >> >> This initial proposal was using a new faccessat2(2) flag: >> AT_INTERPRETED, see >> https://lore.kernel.org/all/20200908075956.1069018-2-mic@digikod.net/ >> What do you think about that? I'm happy to get back to this version if >> everyone is OK with it. > > I'm certainly happi_er_ with that, but I find that particular patch > odd for other reasons. > > In no particular order: > > - what's with the insane non-C syntax? Double parentheses have no > actual meaning in C: > > if ((flags & AT_INTERPRETED)) { > if ((mode & MAY_EXEC)) { > > so I don't understand why you'd use that strance thing. I guess it comes from a previous version that ANDed two booleans. > > - why is this an AT_INTERPRETED flag? I don't understand the name, I > don't understand the semantics. I wasn't sure it was a good idea to add another mode bit, so I ended up using a flag to not break compatibility of other mode checks but extend the semantic to interpreted scripts. But I agree that a new mode makes sense. > > Why would that flag have the same value as AT_SYMLINK_FOLLOW? It was a bug. > > Why isn't this just a new _mode_ bit, which is what I feel is > sensible? We only use three bits (with no bits set meaning > "existence"), so we have *tons* of bits left in that namespace, and it > would make much more sense to me to have > > #define EXECVE_OK 8 > > which is the same as the "group exec" bit, so it actually makes > some kind of sense too. Looks fine to me. The "EXECVE_" prefix is a bit weird but it will not be defined in the kernel like X_OK and others anyway, and as you said, it matches S_IXGRP. > > - related to that "I don't understand the semantics", the > "documentation" for that thing doesn't make sense either: > > + The > + main usage is for script > + interpreters to enforce a policy > + consistent with the kernel's one > + (through sysctl configuration or LSM > + policy). */ I'll synchronize the documentation with a next series. > > Now, what I *think* you mean is See a following email in reply to Kees. [...] > > And yes, we still need to talk details: > > - no backwards compatibility issues, because we've happily always > checked 'mode' for being valid, so old kernels will always return > -EINVAL. > > - POSIX namespace issues for EXECVE_OK means that the name probably > needs some thinking (maybe we just need to call it __ACCESS_OK_EXECVE > internally or something - the kernel actually doesn't even export the > existing [FRWX]_OK values, because we "know" they map tho the usual > "owner RWX" bits, with F being "no bit set") Right, I cannot find a better name for now. See a following email in reply to Kees. [...] > > So to recap: I'm very much ok with some access() extension, but I > think even that very much needs clarification, and the existing patch > is just odd in many many ways. This v8 was kind of an early version, I'll update everything. Thanks!
On 05/04/2022 00:25, Kees Cook wrote: > On Mon, Apr 04, 2022 at 02:28:19PM -0700, Linus Torvalds wrote: >> Now, what I *think* you mean is >> >> (1) user-space executable loaders want to be able to test the *same* >> policy as the kernel does for execve() > > Right. The script interpreter wants to ask "if this file were actually > an ELF going through execve(), would the kernel allow it?" The current behavior was a bit more flexible thanks to the sysctl. It was either the mount exec option check, the file perm check or both. The rationale is to let sysadmins adapt their system to existing applications/interpreters without breaking. Only basing the check on mount exec and file perm could be an issue in the short term, but I guess it would deter inconsistencies in existing systems… I'm not sure it is a wise move because if no interpreter want to use this check it would then be useless. See commit message in https://lore.kernel.org/all/20220104155024.48023-3-mic@digikod.net/ and the trusted_for_policy sysctl documentation: "Even without enforced security policy, user space interpreters can use this syscall to try as much as possible to enforce the system policy at their level, knowing that it will not break anything on running systems which do not care about this feature. However, on systems which want this feature enforced, there will be knowledgeable people (i.e. system administrator who configured fs.trusted_for_policy deliberately) to manage it. [...]" > >> (2) access(path, EXECVE_OK) will do the same permission checks as >> "execve()" would do for that path > > Maybe. I defer to Mickaël here, but my instinct is to avoid creating an > API that can be accidentally misused. I'd like this to be fd-only based, > since that removes path name races. (e.g. trusted_for() required an fd.) The fd-based approach is definitely better from a security point of view but there is indeed a use case for pathnames. I guess we could highlight this point in the documentation. > >> (3) if you already have the fd open, use "faccess(fd, NULL, >> F_OK_TO_EXECUTE, AT_EMPTY_PATH)" > > Yes, specifically faccessat2(). (And continuing the race thought above, > yes, there could still be races if the content of the file could be > changed, but that case is less problematic under real-world conditions.) I'm not worried about changes in the file once it is opened. This could be an issue but not in the kernel (e.g. flaky update system). [...]
On 05/04/2022 01:26, Linus Torvalds wrote: > On Mon, Apr 4, 2022 at 3:25 PM Kees Cook <keescook@chromium.org> wrote: [...] > >> I think this already exists as AT_EACCESS? It was added with >> faccessat2() itself, if I'm reading the history correctly. > > Yeah, I noticed myself, I just hadn't looked (and I don't do enough > user-space programming to be aware of if that way). I think AT_EACCESS should be usable with the new EXECVE_OK too. > >>> (a) "what about suid bits that user space cannot react to" >> >> What do you mean here? Do you mean setid bits on the file itself? > > Right. > > Maybe we don't care. I think we don't. I think the only corner case that could be different is for files that are executable, SUID and non-readable. In this case it wouldn't matter because userspace could not read the file, which is required for interpretation/execution. Anyway, S[GU]ID bits in scripts are just ignored by execve and we want to follow the same semantic. > > Maybe we do. > > Is the user-space loader going to honor them? Is it going to ignore > them? I don't know. And it actually interacts with things like > 'nosuid', which the kernel does know about, and user space has a hard > time figuring out. > > So if the point is "give me an interface so that I can do the same > thing a kernel execve() loader would do", then those sgid/suid bits > actually may be exactly the kind of thing that user space wants the > kernel to react to - should it ignore them, or should it do something > special when it sees that they are set? > > I'm not saying that they *should* be something we care about. All I'm > saying is that I want that *discussion* to happen. > > Linus
On 05/04/2022 16:54, Theodore Ts'o wrote: > On Mon, Apr 04, 2022 at 10:30:13PM +0200, Mickaël Salaün wrote: >>> If you add a new X_OK variant to access(), maybe that could fly. >> >> As answered in private, that was the approach I took for one of the early >> versions but a dedicated syscall was requested by Al Viro: >> https://lore.kernel.org/r/2ed377c4-3500-3ddc-7181-a5bc114ddf94@digikod.net >> The main reason behind this request was that it doesn't have the exact same >> semantic as faccessat(2). The changes for this syscall are documented here: >> https://lore.kernel.org/all/20220104155024.48023-3-mic@digikod.net/ >> The whole history is linked in the cover letter: >> https://lore.kernel.org/all/2ed377c4-3500-3ddc-7181-a5bc114ddf94@digikod.net/ > > As a suggestion, something that can be helpful for something which has > been as heavily bike-sheded as this concept might be to write a > "legislative history", or perhaps, a "bike shed history". > > And not just with links to mailing list discussions, but a short > summary of why, for example, we moved from the open flag O_MAYEXEC to > the faccessat(2) approach. I looked, but I couldn't find the > reasoning while diving into the mail archives. And there was some > kind of request for some new functionality for IMA and other LSM's > that I couldn't follow that is why the new AT_INTERETED flag, but at > this point my time quantuum for mailing list archeology most > definitely expired. :-) > > It might be that when all of this is laid out, we can either revisit > prior design decisions as "that bike-shed request to support this > corner case was unreasonable", or "oh, OK, this is why we need as > fully general a solution as this". > > Also, some of examples of potential future use cases such as "magic > links" that were linked in the cover letter, it's not clear to me > actually make sense in the context of a "trusted for" system call > (although might make more sense in the context of an open flag). So > revisiting some of those other cases to see whether they actually > *could* be implemented as new "TRUSTED_FOR" flags might be > instructive. > > Personally, I'm a bit skeptical about the prospct of additional use > cases, since trusted_for(2) is essentially a mother_should_I(2) That would be an interesting syscall name. ;) > request where userspace is asking the kernel whether they should go > ahead and do some particular policy thing. And it's not clear to me > how many of these policy questions exist where (a) the kernel is in > the past position to answer that question, and (b) there isn't some > additional information that the kernel doesn't have that might be > needed to answer that question. Script execution is definitely the main use case and the semantic is already known by the kernel. > > For example, "Mother should I use that private key file" might require > information about whether the SRE is currently on pager duty or not, > at least for some policies, and the kernel isn't going to have that > information. > > Other examples of TRUSTED_FOR flags that really make sense and would > be useful might help alleviate my skepticsm. And the "bike shed > history" would help with my question about why some folks didn't like > the original O_MAYEXEC flag? Thanks, I'll do some that. > > Cheers, > > - Ted
On Tue, Apr 5, 2022 at 9:08 AM Mickaël Salaün <mic@digikod.net> wrote: > > I think we don't. I think the only corner case that could be different > is for files that are executable, SUID and non-readable. In this case it > wouldn't matter because userspace could not read the file, which is > required for interpretation/execution. Anyway, S[GU]ID bits in scripts > are just ignored by execve and we want to follow the same semantic. So I just want to bring up the possibility that somebody wants to just implement execve() in user space for some reason - not just "script interpreter". It's *doable*. Don't ask me if it's sane or useful, but people have done insane things before. Things like "emulate other operating systems in user space" etc Such a user can trivially see the suid/sgid bit on the file (just do fstat() on it), but wouldn't necessarily see if that file is then in a mount that is mounted nosuid. Now, I think the right thing to do is to just say "we don't support it", but I do think it should perhaps be mentioned somewhere explicitly. Particularly since I can well imagine that a security policy might have some "no, I don't allow suid exec" and return an actual error for it, and then the access() call would fail for that case. (Ok, so the security policies would look at the actual bprm data on a real exec, not the inode executable, so that's kind of made up and theoretical, but I just want this issue to be mentioned somewhere so that people are aware that the "it's the same basic file checking that execve does, but a _real_ execve might then have _other_ issues going on, including suid bits etc") Linus
On Mon, Apr 04, 2022 at 04:26:44PM -0700, Linus Torvalds wrote: > > > (a) "what about suid bits that user space cannot react to" > > > > What do you mean here? Do you mean setid bits on the file itself? > > Right. > > Maybe we don't care. > > Maybe we do. > > Is the user-space loader going to honor them? Is it going to ignore > them? I don't know. And it actually interacts with things like > 'nosuid', which the kernel does know about, and user space has a hard > time figuring out. So there *used* to be suidperl which was a setuid version of perl with some extra security checks. (See [1] for more details.) The suidperl binary would be used by #!/usr/bin/perl so it could honor setuid bits on perl scripts, but it was deprecated in Perl 5.8 and removed in Perl 5.12 in 2010[2]. [1] https://mattmccutchen.net/suidperl.html [2] https://metacpan.org/release/SHAY/perl-5.20.2/view/pod/perl5120delta.pod#Deprecations So it's possible that the user-space loader might try to honor them, and if there was such an example "in the field", it might be nice if there was a way for the kernel to advise userspace about the nosuid. But I'm not aware of any other shell script interpreter that tried do what perl did with suidperl. > So if the point is "give me an interface so that I can do the same > thing a kernel execve() loader would do", then those sgid/suid bits > actually may be exactly the kind of thing that user space wants the > kernel to react to - should it ignore them, or should it do something > special when it sees that they are set? > > I'm not saying that they *should* be something we care about. All I'm > saying is that I want that *discussion* to happen. I'm not convinced we should. I suppose *if* the shell script was suid, *and* the file system was mounted nosuid, then the check could return false, and that would be mostly harmless even if the script interpreter didn't support setuid. But it's extra complexity, and in theory it could break a setuid script, where the setuid bit was previously a no-op, and it now might cause a problem for that user. - Ted
*thread necromancy* On Tue, Apr 05, 2022 at 06:09:03PM +0200, Mickaël Salaün wrote: > > On 05/04/2022 01:26, Linus Torvalds wrote: > > On Mon, Apr 4, 2022 at 3:25 PM Kees Cook <keescook@chromium.org> wrote: > > [...] > > > > > > I think this already exists as AT_EACCESS? It was added with > > > faccessat2() itself, if I'm reading the history correctly. > > > > Yeah, I noticed myself, I just hadn't looked (and I don't do enough > > user-space programming to be aware of if that way). > > I think AT_EACCESS should be usable with the new EXECVE_OK too. > > > > > > > > (a) "what about suid bits that user space cannot react to" > > > > > > What do you mean here? Do you mean setid bits on the file itself? > > > > Right. > > > > Maybe we don't care. > > I think we don't. I think the only corner case that could be different is > for files that are executable, SUID and non-readable. In this case it > wouldn't matter because userspace could not read the file, which is required > for interpretation/execution. Anyway, S[GU]ID bits in scripts are just > ignored by execve and we want to follow the same semantic. Hi Mickaël, Is there a new version of this being worked on? It would be really nice to have the O_MAYEXEC/faccessat2() visibility for script execution control in userspace. It seems like it would be mainly a respin of an earlier version of this series before trusted_for() was proposed. -Kees
On 08/02/2023 20:32, Kees Cook wrote: > *thread necromancy* > > On Tue, Apr 05, 2022 at 06:09:03PM +0200, Mickaël Salaün wrote: >> >> On 05/04/2022 01:26, Linus Torvalds wrote: >>> On Mon, Apr 4, 2022 at 3:25 PM Kees Cook <keescook@chromium.org> wrote: >> >> [...] >> >>> >>>> I think this already exists as AT_EACCESS? It was added with >>>> faccessat2() itself, if I'm reading the history correctly. >>> >>> Yeah, I noticed myself, I just hadn't looked (and I don't do enough >>> user-space programming to be aware of if that way). >> >> I think AT_EACCESS should be usable with the new EXECVE_OK too. >> >> >>> >>>>> (a) "what about suid bits that user space cannot react to" >>>> >>>> What do you mean here? Do you mean setid bits on the file itself? >>> >>> Right. >>> >>> Maybe we don't care. >> >> I think we don't. I think the only corner case that could be different is >> for files that are executable, SUID and non-readable. In this case it >> wouldn't matter because userspace could not read the file, which is required >> for interpretation/execution. Anyway, S[GU]ID bits in scripts are just >> ignored by execve and we want to follow the same semantic. > > Hi Mickaël, > > Is there a new version of this being worked on? It would be really nice > to have the O_MAYEXEC/faccessat2() visibility for script execution control > in userspace. It seems like it would be mainly a respin of an earlier > version of this series before trusted_for() was proposed. Yes, I plan to send a new version in a few weeks. > > -Kees >