Message ID | db1c1de0-3672-4bae-ef45-c554379f36f4@omp.ru (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | NAX (No Anonymous Execution) LSM | expand |
Hello Igor, first, a disclaimer: this is the first time I review a kernel patch, so everything I say must a taken with a grain of salt (a handful, really). On 7/7/21 3:03 AM, Igor Zhbanov wrote: > NAX (No Anonymous Execution) is a Linux Security Module that extends DAC > by making impossible to make anonymous and modified pages executable for > privileged processes. > > The module intercepts anonymous executable pages created with mmap() and > mprotect() system calls. > > Depending on the settings, the module can block and log violating system > calls or log and kill the violating process. > From what I see, there is also a log-only mode (MODE_PERMISSIVE) that users may use to try this patch before deploying it. You could probably express that. Something like "This module will log violations, and it can also block the action or kill the offending process, depending on the enabled settings." maybe? > Signed-off-by: Igor Zhbanov <i.zhbanov@omp.ru> > --- > Documentation/admin-guide/LSM/NAX.rst | 48 ++++ > Documentation/admin-guide/LSM/index.rst | 1 + > security/Kconfig | 11 +- > security/Makefile | 2 + > security/nax/Kconfig | 71 +++++ > security/nax/Makefile | 4 + > security/nax/nax-lsm.c | 344 ++++++++++++++++++++++++ > 7 files changed, 476 insertions(+), 5 deletions(-) > create mode 100644 Documentation/admin-guide/LSM/NAX.rst > create mode 100644 security/nax/Kconfig > create mode 100644 security/nax/Makefile > create mode 100644 security/nax/nax-lsm.c > > diff --git a/Documentation/admin-guide/LSM/NAX.rst b/Documentation/admin-guide/LSM/NAX.rst > new file mode 100644 > index 000000000000..b742f881f3d7 > --- /dev/null > +++ b/Documentation/admin-guide/LSM/NAX.rst > @@ -0,0 +1,48 @@ > +======= > +NAX LSM > +======= > + > +:Author: Igor Zhbanov > + > +NAX (No Anonymous Execution) is a Linux Security Module that extends DAC > +by making impossible to make anonymous and modified pages executable for > +privileged processes. The module intercepts anonymous executable pages > +created with mmap() and mprotect() system calls. > + > +To select it at boot time, specify ``security=nax`` (though this will > +disable any other LSM). > + > +The privileged process is a process for which any of the following is true: > + > +- ``uid == 0`` > +- ``euid == 0`` > +- ``suid == 0`` > +- ``fsuid == 0`` > +- ``cap_effective`` has any capability except of ``kernel.nax.allowed_caps`` > +- ``cap_permitted`` has any capability except of ``kernel.nax.allowed_caps`` > + Maybe "except those explicitly allowed in" or "except for the ones allowed in"? > +The following sysctl parameters are available: > + > +* ``kernel.nax.allowed_caps``: > + > + Hexadecimal number representing allowed capabilities set for the privileged > + processes. > + Maybe "representing the set of capabilities a non-root process can possess without being considered "privileged" by NAX"? > +* ``kernel.nax.enforcing``: > + > + - 0: Only log errors (when enabled by ``kernel.nax.quiet``) > + - 1: Forbid unsafe pages mappings (default) > + > +* ``kernel.nax.locked``: > + > + - 0: Changing of the module's sysctl parameters is allowed > + - 1: Further changing of the module's sysctl parameters is forbidden > + > + Setting this parameter to ``1`` after initial setup during the system boot > + will prevent the module disabling at the later time. > + > +* ``kernel.nax.quiet``: > + > + - 0: Log violations (default) > + - 1: Be quiet > + - 2: Kill the violating process and log "quiet" probably not the right word to express that fact that it controls the NAX execution mode. Something like "mode" or "level" would probably be clearer. But while reading the patch below I believe this doc is simply a little out of date, because quiet can only take two values: 0 or 1 (otherwise it would be redundant with the 'enforcing' sysctl). So you should consider updating this document. > diff --git a/Documentation/admin-guide/LSM/index.rst b/Documentation/admin-guide/LSM/index.rst > index a6ba95fbaa9f..e9df7fc9a461 100644 > --- a/Documentation/admin-guide/LSM/index.rst > +++ b/Documentation/admin-guide/LSM/index.rst > @@ -42,6 +42,7 @@ subdirectories. > > apparmor > LoadPin > + NAX > SELinux > Smack > tomoyo > diff --git a/security/Kconfig b/security/Kconfig > index 0ced7fd33e4d..771419647ae1 100644 > --- a/security/Kconfig > +++ b/security/Kconfig > @@ -239,6 +239,7 @@ source "security/yama/Kconfig" > source "security/safesetid/Kconfig" > source "security/lockdown/Kconfig" > source "security/landlock/Kconfig" > +source "security/nax/Kconfig" > > source "security/integrity/Kconfig" > > @@ -278,11 +279,11 @@ endchoice > > config LSM > string "Ordered list of enabled LSMs" > - default "landlock,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK > - default "landlock,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR > - default "landlock,lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO > - default "landlock,lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC > - default "landlock,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf" > + default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK > + default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR > + default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO > + default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC > + default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf" I don't know the policy with regard to new LSMs, but enabling this one by default is maybe a bit violent? That said, considering the default mode is SECURITY_NAX_MODE_LOG, this would just log kernel messages and not break existing systems, so this shouldn't be a problem. > help > A comma-separated list of LSMs, in initialization order. > Any LSMs left off this list will be ignored. This can be > diff --git a/security/Makefile b/security/Makefile > index 47e432900e24..5c261bbf4659 100644 > --- a/security/Makefile > +++ b/security/Makefile > @@ -14,6 +14,7 @@ subdir-$(CONFIG_SECURITY_SAFESETID) += safesetid > subdir-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown > subdir-$(CONFIG_BPF_LSM) += bpf > subdir-$(CONFIG_SECURITY_LANDLOCK) += landlock > +subdir-$(CONFIG_SECURITY_NAX) += nax > > # always enable default capabilities > obj-y += commoncap.o > @@ -34,6 +35,7 @@ obj-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown/ > obj-$(CONFIG_CGROUPS) += device_cgroup.o > obj-$(CONFIG_BPF_LSM) += bpf/ > obj-$(CONFIG_SECURITY_LANDLOCK) += landlock/ > +obj-$(CONFIG_SECURITY_NAX) += nax/ > > # Object integrity file lists > subdir-$(CONFIG_INTEGRITY) += integrity > diff --git a/security/nax/Kconfig b/security/nax/Kconfig > new file mode 100644 > index 000000000000..60ef0964f00a > --- /dev/null > +++ b/security/nax/Kconfig > @@ -0,0 +1,71 @@ > +# SPDX-License-Identifier: GPL-2.0-only > +config SECURITY_NAX > + bool "NAX support" > + depends on SECURITY > + default n > + help > + This selects NAX (No Anonymous Execution), which extends DAC > + support with additional system-wide security settings beyond > + regular Linux discretionary access controls. Currently available > + is restriction to make anonymous and modified pages executable > + to privileged processes. Like capabilities, this security module > + stacks with other LSMs. Further information can be found in > + Documentation/admin-guide/LSM/NAX.rst. > + > + If you are unsure how to answer this question, answer N. > + > +config SECURITY_NAX_LOCKED > + bool "Lock NAX settings" > + depends on SECURITY_NAX > + help > + If selected, it will not be possible to change enforcing and quiet > + settings via sysctl or the kernel command line. If not selected, > + it can be enabled at boot with the kernel parameter "nax_locked=1" > + or "kernel.nax_locked=1" sysctl (if the settings are not locked). > + > +config SECURITY_NAX_QUIET > + bool "Silence NAX messages" > + depends on SECURITY_NAX > + help > + If selected, NAX will not print violations. If not selected, it can > + be enabled at boot with the kernel parameter "nax_quiet=1" or > + "kernel.nax_quiet=1" sysctl (if the settings are not locked). > + You probably should document the boot option in the kernel-parameters.txt file. > +choice > + prompt "NAX violation action mode" > + default SECURITY_NAX_MODE_LOG > + depends on SECURITY_NAX > + help > + Select the NAX violation action mode. > + > + In the default permissive mode the violations are only logged > + (if logging is not suppressed). In the enforcing mode the violations > + are prohibited. And in the kill mode the process is terminated. > + > + The value can be changed at boot with the kernel parameter > + "nax_mode" (0, 1, 2) or "kernel.nax_mode=" (0, 1, 2) sysctl (if the > + settings are not locked). > + > + config SECURITY_NAX_MODE_LOG > + bool "Permissive mode" > + help > + In this mode violations are only logged (if logging is not > + suppressed). > + config SECURITY_NAX_MODE_ENFORCING > + bool "Enforcing mode" > + help > + In this mode violations are prohibited and logged (if > + logging is not suppressed). > + config SECURITY_NAX_MODE_KILL > + bool "Kill mode" > + help > + In this mode the voilating process is terminated. The event "violating" > + is logged (if logging is not suppressed). > +endchoice > + > +config SECURITY_NAX_MODE > + int > + depends on SECURITY_NAX > + default 0 if SECURITY_NAX_MODE_LOG > + default 1 if SECURITY_NAX_MODE_ENFORCING > + default 2 if SECURITY_NAX_MODE_KILL > diff --git a/security/nax/Makefile b/security/nax/Makefile > new file mode 100644 > index 000000000000..9c3372210c77 > --- /dev/null > +++ b/security/nax/Makefile > @@ -0,0 +1,4 @@ > +# SPDX-License-Identifier: GPL-2.0-only > +obj-$(CONFIG_SECURITY_NAX) := nax.o > + > +nax-y := nax-lsm.o > diff --git a/security/nax/nax-lsm.c b/security/nax/nax-lsm.c > new file mode 100644 > index 000000000000..ef99d9b36a9d > --- /dev/null > +++ b/security/nax/nax-lsm.c > @@ -0,0 +1,344 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* Copyright (C) 2016-2021 Open Mobile Platform LLC. > + * > + * Written by: Igor Zhbanov <i.zhbanov@omp.ru, izh1979@gmail.com> > + * > + * NAX (No Anonymous Execution) Linux Security Module > + * This module prevents execution of the code in anonymous or modified pages. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2, as > + * published by the Free Software Foundation. */ Per checkpatch: WARNING: Block comments use a trailing */ on a separate line #328: FILE: security/nax/nax-lsm.c:48: + * or it has any unsafe capability (even in a user namespace) */ Checkpatch also complains about a few places where you could use tabs instead of spaces, or add a space, and so on. As a general rule, running 'scripts/checkpatch.pl' prior to sending the patch is considered a good practice (see 'Documentation/process/submitting-patches.rst'). > + > +#define pr_fmt(fmt) "NAX: " fmt > + > +#include <linux/capability.h> > +#include <linux/cred.h> > +#include <linux/ctype.h> > +#include <linux/lsm_hooks.h> > +#include <linux/mman.h> > +#include <linux/sched.h> > +#include <linux/securebits.h> > +#include <linux/security.h> > +#include <linux/sysctl.h> > +#include <linux/uidgid.h> > + > +#define NAX_MODE_PERMISSIVE 0 /* Log only */ > +#define NAX_MODE_ENFORCING 1 /* Enforce and log */ > +#define NAX_MODE_KILL 2 /* Kill process and log */ > + > +static int mode = CONFIG_SECURITY_NAX_MODE, > + quiet = IS_ENABLED(CONFIG_SECURITY_NAX_QUIET), > + locked = IS_ENABLED(CONFIG_SECURITY_NAX_LOCKED); > + > +#define ALLOWED_CAPS_HEX_LEN (_KERNEL_CAPABILITY_U32S * 8) > + > +static char allowed_caps_hex[ALLOWED_CAPS_HEX_LEN + 1]; > +static kernel_cap_t allowed_caps = CAP_EMPTY_SET; > + > +static int > +is_privileged_process(void) > +{ > + const struct cred *cred; > + kuid_t root_uid; > + > + cred = current_cred(); > + root_uid = make_kuid(cred->user_ns, 0); > + /* We count a process as privileged if it any of its IDs is zero > + * or it has any unsafe capability (even in a user namespace) */ > + if ((!issecure(SECURE_NOROOT) && (uid_eq(cred->uid, root_uid) || > + uid_eq(cred->euid, root_uid) || > + uid_eq(cred->suid, root_uid) || > + uid_eq(cred->fsuid, root_uid))) || > + (!cap_issubset(cred->cap_effective, allowed_caps)) || > + (!cap_issubset(cred->cap_permitted, allowed_caps))) > + return 1; > + > + return 0; > +} > + > +static void > +log_warn(const char *reason) > +{ > + if (quiet) > + return; > + > + pr_warn_ratelimited("%s: pid=%d, uid=%u, comm=\"%s\"\n", > + reason, current->pid, > + from_kuid(&init_user_ns, current_cred()->uid), > + current->comm); Have you considered writing to the audit log instead of the kernel messages directly? (not saying that this is necessarily better, but is there a reasoning to prefer one or the other here? Audit logs are often consumed by automated tools and it may be more pratical for people to detect and treat violations if the messages were pushed to the audit log - but conversely, that requires defining and maintaining a stable log format for consumers) > +} > + > +static void > +kill_current_task(void) > +{ > + pr_warn("Killing pid=%d, uid=%u, comm=\"%s\"\n", > + current->pid, from_kuid(&init_user_ns, current_cred()->uid), > + current->comm); The same question applies here. > + force_sig(SIGKILL); > +} > + > +static int > +nax_mmap_file(struct file *file, unsigned long reqprot, > + unsigned long prot, unsigned long flags) > +{ > + int ret = 0; > + > + if (mode == NAX_MODE_PERMISSIVE && quiet) > + return 0; /* Skip further checks in this case */ > + > + if (!(prot & PROT_EXEC)) /* Not executable memory */ > + return 0; > + > + if (!is_privileged_process()) > + return 0; /* Not privileged processes can do anything */ > + > + if (!file) { /* Anonymous executable memory */ > + log_warn("MMAP_ANON_EXEC"); > + ret = -EACCES; > + } else if (prot & PROT_WRITE) { /* Mapping file RWX */ > + log_warn("MMAP_FILE_WRITE_EXEC"); > + ret = -EACCES; > + } > + > + if (mode == NAX_MODE_KILL) > + kill_current_task(); > + > + return (mode != NAX_MODE_PERMISSIVE) ? ret : 0; > +} > + > +static int > +nax_file_mprotect(struct vm_area_struct *vma, unsigned long reqprot, > + unsigned long prot) > +{ > + if (mode == NAX_MODE_PERMISSIVE && quiet) > + return 0; /* Skip further checks in this case */ > + > + if (!(prot & PROT_EXEC)) /* Not executable memory */ > + return 0; > + > + if (!is_privileged_process()) > + return 0; /* Not privileged processes can do anything */ > + > + if (!(vma->vm_flags & VM_EXEC)) { > + int ret = 0; > + > + if (vma->vm_start >= vma->vm_mm->start_brk && > + vma->vm_end <= vma->vm_mm->brk) { > + log_warn("MPROTECT_EXEC_HEAP"); > + ret = -EACCES; > + } else if (!vma->vm_file && > + ((vma->vm_start <= vma->vm_mm->start_stack && > + vma->vm_end >= vma->vm_mm->start_stack) || > + vma_is_stack_for_current(vma))) { > + log_warn("MPROTECT_EXEC_STACK"); > + ret = -EACCES; > + } else if (vma->vm_file && vma->anon_vma) { > + /* We are making executable a file mapping that has > + * had some COW done. Since pages might have been > + * written, check ability to execute the possibly > + * modified content. This typically should only > + * occur for text relocations. */ > + log_warn("MPROTECT_EXEC_MODIFIED"); > + ret = -EACCES; > + } > + > + if (ret) { > + if (mode == NAX_MODE_KILL) > + kill_current_task(); > + > + return (mode != NAX_MODE_PERMISSIVE) ? ret : 0; > + } > + } > + > + return nax_mmap_file(vma->vm_file, reqprot, prot, > + vma->vm_flags & VM_SHARED); Considering many checks in nax_mmap_file were already done in this function, wouldn't it be simpler to write the rest the function too (and you could distinguish MRPOTECT_ANON_EXEC and MMAP_ANON_EXEC that way). What do you think of something like: > - > - if (ret) { > - if (mode == NAX_MODE_KILL) > - kill_current_task(); > - > - return (mode != NAX_MODE_PERMISSIVE) ? ret : 0; > - } > - } > - > - return nax_mmap_file(vma->vm_file, reqprot, prot, > - vma->vm_flags & VM_SHARED); > + } else { > + if (!vma->vm_file) { /* Anonymous executable memory */ > + log_warn("MRPOTECT_ANON_EXEC"); > + ret = -EACCES; > + } else if (prot & PROT_WRITE) { /* remapping the file as RWX */ > + log_warn("MPROTECT_FILE_WRITE_EXEC"); > + ret = -EACCES; > + } > + } > + > + if (ret && mode == NAX_MODE_KILL) > + kill_current_task(); > + > + return (mode != NAX_MODE_PERMISSIVE) ? ret : 0; > +} > + > +static struct security_hook_list nax_hooks[] __lsm_ro_after_init = { > + LSM_HOOK_INIT(mmap_file, nax_mmap_file), > + LSM_HOOK_INIT(file_mprotect, nax_file_mprotect), > +}; > + > +#ifdef CONFIG_SYSCTL > + > +static int > +nax_dointvec_minmax(struct ctl_table *table, int write, > + void *buffer, size_t *lenp, loff_t *ppos) > +{ > + if (write && (!capable(CAP_SYS_ADMIN) || locked)) > + return -EPERM; > + > + return proc_dointvec_minmax(table, write, buffer, lenp, ppos); > +} > + > +static int > +nax_dostring(struct ctl_table *table, int write, void *buffer, > + size_t *lenp, loff_t *ppos) > +{ > + if (write) { > + int error; > + char *buf = (char *)buffer; > + size_t len = *lenp, i; > + kernel_cap_t caps = CAP_EMPTY_SET; The kernel style guide mandates that all variables are only defined at the beggining of the function, and not at the beggining of any block like C89. > + > + if (!capable(CAP_SYS_ADMIN) || locked) > + return -EPERM; > + > + /* Do not allow trailing garbage or excessive length */ > + if (len == ALLOWED_CAPS_HEX_LEN + 1) { > + if (buf[--len] != '\n') > + return -EINVAL> + } else if (len > ALLOWED_CAPS_HEX_LEN || len <= 0) > + return -EINVAL; > + > + if ((error = proc_dostring(table, write, buffer, lenp, ppos))) > + return error; > + Nit: considering that allowed_caps_hex is only used in this function, and that it represents a small amount of stack space, couldn't you define it directly in the function? > + len = strlen(allowed_caps_hex); > + for (i = 0; i < len; i++) > + if (!isxdigit(allowed_caps_hex[i])) > + return -EINVAL;> + > + for (i = 0; i < _KERNEL_CAPABILITY_U32S; i++) { > + unsigned long l; > + > + if (kstrtoul(allowed_caps_hex + > + (len >= 8 ? len - 8 : 0), 16, &l)) > + return -EINVAL; > + > + caps.cap[i] = l; > + if (len < 8) > + break; > + > + len -= 8; > + allowed_caps_hex[len] = '\0'; > + } > + > + allowed_caps = cap_intersect(caps, CAP_FULL_SET); This operation doesn't look atomic to me, and I think other operations couldn run while this write is ongoing. While users will probably not write to this often (and a proper system would be locked anyway, otherwise the attacker would just have to write to the NAX "mode" sysctl to disable it and carry on his attack), it could break programs (deny the action of kill the process) that perform a mmap() or a mprotect() and read garbage in allowed_caps because a write to the structure was running concurrently. You should maybe consider a way to ensure either the update is atomic or the update is in a critical section. > + return 0; > + } else { > + unsigned i; > + > + CAP_FOR_EACH_U32(i) > + snprintf(allowed_caps_hex + i * 8, 9, "%08x", > + allowed_caps.cap[CAP_LAST_U32 - i]); > + > + return proc_dostring(table, write, buffer, lenp, ppos); > + } > +} > + > +struct ctl_path nax_sysctl_path[] = { > + { .procname = "kernel" }, > + { .procname = "nax" }, > + { } > +}; > + > +static int max_mode = NAX_MODE_KILL; > + > +static struct ctl_table nax_sysctl_table[] = { > + { > + .procname = "allowed_caps", > + .data = allowed_caps_hex, > + .maxlen = ALLOWED_CAPS_HEX_LEN + 1, > + .mode = 0644, > + .proc_handler = nax_dostring, > + }, { > + .procname = "locked", > + .data = &locked, > + .maxlen = sizeof(int), > + .mode = 0644, > + .proc_handler = nax_dointvec_minmax, > + .extra1 = SYSCTL_ZERO, > + .extra2 = SYSCTL_ONE, > + }, { > + .procname = "mode", > + .data = &mode, > + .maxlen = sizeof(int), > + .mode = 0644, > + .proc_handler = nax_dointvec_minmax, > + .extra1 = SYSCTL_ZERO, > + .extra2 = &max_mode, > + }, { > + .procname = "quiet", > + .data = &quiet, > + .maxlen = sizeof(int), > + .mode = 0644, > + .proc_handler = nax_dointvec_minmax, > + .extra1 = SYSCTL_ZERO, > + .extra2 = SYSCTL_ONE, > + }, > + { } > +}; > + > +static void __init > +nax_init_sysctl(void) > +{ > + if (!register_sysctl_paths(nax_sysctl_path, nax_sysctl_table)) > + panic("NAX: sysctl registration failed.\n"); > +} > + > +#else /* !CONFIG_SYSCTL */ > + > +static inline void > +nax_init_sysctl(void) > +{ > + > +} > + > +#endif /* !CONFIG_SYSCTL */ > + > +static int __init setup_mode(char *str) > +{ > + unsigned long val; > + > + if (!locked && !kstrtoul(str, 0, &val)) { > + if (val > max_mode){ > + pr_err("Invalid 'nax_mode' parameter value (%s)\n", > + str); > + val = max_mode; > + } > + > + mode = val; > + } > + > + return 1; > +} > +__setup("nax_mode=", setup_mode); > + > +static int __init setup_quiet(char *str) > +{ > + unsigned long val; > + > + if (!locked && !kstrtoul(str, 0, &val)) > + quiet = val ? 1 : 0; The order of the kernel parameters will have an impact on NAX behavior. nax_mode=1 nax_locked=1 and nax_locked=1 nax_mode=1 will end up with the same behavior. in the first case the mode is enforced, in the second case it isn't (well unless you changed the kernel configuration from the default) and it can't be enabled later either. Is that desired? > + > + return 1; > +} > +__setup("nax_quiet=", setup_quiet); > + > +static int __init setup_locked(char *str) > +{ > + unsigned long val; > + > + if (!locked && !kstrtoul(str, 0, &val)) > + locked = val ? 1 : 0; > + > + return 1; > +} > +__setup("nax_locked=", setup_locked); > + > +static __init int > +nax_init(void) > +{ > + pr_info("Starting.\n"); > + security_add_hooks(nax_hooks, ARRAY_SIZE(nax_hooks), "nax"); > + nax_init_sysctl(); > + > + return 0; > +} > + > +DEFINE_LSM(nax) = { > + .name = "nax", > + .init = nax_init, > +}; > Review aside, this patch is certainly interesting for providing a simple way to block anonymous excutable mappins, so thanks for the submission. I have to wonder: wouldn't it be interesting to add an option to enable NAX for all processes on the system, as you mentioned in the cover letter? Have a nice day, Simon
snip... >> +#define pr_fmt(fmt) "NAX: " fmt >> + >> +#include <linux/capability.h> >> +#include <linux/cred.h> >> +#include <linux/ctype.h> >> +#include <linux/lsm_hooks.h> >> +#include <linux/mman.h> >> +#include <linux/sched.h> >> +#include <linux/securebits.h> >> +#include <linux/security.h> >> +#include <linux/sysctl.h> >> +#include <linux/uidgid.h> >> + >> +#define NAX_MODE_PERMISSIVE 0 /* Log only */ >> +#define NAX_MODE_ENFORCING 1 /* Enforce and log */ >> +#define NAX_MODE_KILL 2 /* Kill process and log */ >> + >> +static int mode = CONFIG_SECURITY_NAX_MODE, >> + quiet = IS_ENABLED(CONFIG_SECURITY_NAX_QUIET), >> + locked = IS_ENABLED(CONFIG_SECURITY_NAX_LOCKED); >> + >> +#define ALLOWED_CAPS_HEX_LEN (_KERNEL_CAPABILITY_U32S * 8) >> + >> +static char allowed_caps_hex[ALLOWED_CAPS_HEX_LEN + 1]; >> +static kernel_cap_t allowed_caps = CAP_EMPTY_SET; >> + >> +static int >> +is_privileged_process(void) >> +{ >> + const struct cred *cred; >> + kuid_t root_uid; >> + >> + cred = current_cred(); >> + root_uid = make_kuid(cred->user_ns, 0); >> + /* We count a process as privileged if it any of its IDs is zero >> + * or it has any unsafe capability (even in a user namespace) */ >> + if ((!issecure(SECURE_NOROOT) && (uid_eq(cred->uid, root_uid) || >> + uid_eq(cred->euid, root_uid) || >> + uid_eq(cred->suid, root_uid) || >> + uid_eq(cred->fsuid, root_uid))) || >> + (!cap_issubset(cred->cap_effective, allowed_caps)) || >> + (!cap_issubset(cred->cap_permitted, allowed_caps))) >> + return 1; >> + >> + return 0; >> +} >> + >> +static void >> +log_warn(const char *reason) >> +{ >> + if (quiet) >> + return; >> + >> + pr_warn_ratelimited("%s: pid=%d, uid=%u, comm=\"%s\"\n", >> + reason, current->pid, >> + from_kuid(&init_user_ns, current_cred()->uid), >> + current->comm); > Have you considered writing to the audit log instead of the kernel messages directly? > (not saying that this is necessarily better, but is there a reasoning to prefer one or > the other here? Audit logs are often consumed by automated tools and it may be more pratical > for people to detect and treat violations if the messages were pushed to the audit log > - but conversely, that requires defining and maintaining a stable log format for consumers) It's a good idea to writing to the audit log, HOWEVER I'd want to know what all the rest of the LSMs are doing in a case like this. If all of them just write kernel messages, I'd want this module to also write just kernel messages for consistency sake for use with say, log harvesters for a SIEM/XDR system solution. Just in general I like the thought of this LSM. I used to work for a security company in which their cloud "watched" situations where mmap()/mprotect() would use anonymous executable pages for possible "dodgy" behavior. Jay
On 8/10/21 6:52 AM, J Freyensee wrote: [snip] >> Have you considered writing to the audit log instead of the kernel messages directly? >> (not saying that this is necessarily better, but is there a reasoning to prefer one or >> the other here? Audit logs are often consumed by automated tools and it may be more pratical >> for people to detect and treat violations if the messages were pushed to the audit log >> - but conversely, that requires defining and maintaining a stable log format for consumers) > > It's a good idea to writing to the audit log, HOWEVER I'd want to know > what all the rest of the LSMs are doing in a case like this. If all of > them just write kernel messages, I'd want this module to also write just > kernel messages for consistency sake for use with say, log harvesters > for a SIEM/XDR system solution. Right, after taking a quick look through the SafeSetID, YAMA and the future BRUTE LSM, it looks like they all use pr_warn/pr_notice. Only the MACs seem to make use of the audit log, so you can forget what I said about writing to the audit log, this shouldn't be necessary, and is probably a bad idea for consistency, as Jay said. Simon > > Just in general I like the thought of this LSM. I used to work for a > security company in which their cloud "watched" situations where > mmap()/mprotect() would use anonymous executable pages for possible > "dodgy" behavior. > > Jay >
Hi Simon, Thanks for thorough review. Will post the corrected version soon. > > @@ -278,11 +279,11 @@ endchoice > > > > config LSM > > string "Ordered list of enabled LSMs" > > - default "landlock,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK > > - default "landlock,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR > > - default "landlock,lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO > > - default "landlock,lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC > > - default "landlock,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf" > > + default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK > > + default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR > > + default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO > > + default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC > > + default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf" > > I don't know the policy with regard to new LSMs, but enabling this one by default is maybe a bit violent? > That said, considering the default mode is SECURITY_NAX_MODE_LOG, this would just log kernel messages and > not break existing systems, so this shouldn't be a problem. I've just oriended on landlock and lockdown. They are handled in the similar way. But, yes, by default NAX module doesn't block anything. If you suggest not to do that, I can remove it. > > +__setup("nax_mode=", setup_mode); > > + > > +static int __init setup_quiet(char *str) > > +{ > > + unsigned long val; > > + > > + if (!locked && !kstrtoul(str, 0, &val)) > > + quiet = val ? 1 : 0; > > The order of the kernel parameters will have an impact on NAX behavior. > > "nax_mode=1 nax_locked=1" and "nax_locked=1 nax_mode=1" will end up with the same behavior. > in the first case the mode is enforced, in the second case it isn't (well unless you changed > the kernel configuration from the default) and it can't be enabled later either. > > Is that desired? Yes. The idea is that on boot you can set-up the proper options then block further changing (especially turning the module off). Initially it was implemented for sysctl parameters, so, you can change something in init-scripts, then lock. Then, I've extended it to command-line parameters. I can ignore "locked" flag in setup_* functions to defer locking to sysctl parsing. (Unless our command-line is glued by the bootloader from several parts, so we want to lock it as early as possible...). Thanks.
Hi Simon, ср, 28 июл. 2021 г. в 13:19, THOBY Simon <Simon.THOBY@viveris.fr>: > Nit: considering that allowed_caps_hex is only used in this function, and that it represents a small amount of > stack space, couldn't you define it directly in the function? > > > + len = strlen(allowed_caps_hex); It is also used as sysctl parameter input buffer in nax_sysctl_table[] for "allowed_caps" parameter.
Hi Igor, On 8/12/21 10:24 PM, Igor Zhbanov wrote: > Hi Simon, > > ср, 28 июл. 2021 г. в 13:19, THOBY Simon <Simon.THOBY@viveris.fr>: >> Nit: considering that allowed_caps_hex is only used in this function, and that it represents a small amount of >> stack space, couldn't you define it directly in the function? >> >>> + len = strlen(allowed_caps_hex); > > It is also used as sysctl parameter input buffer in nax_sysctl_table[] > for "allowed_caps" parameter. > Yes, what I meant was that maybe you could just declare it at the beginning of the function, and not use it at all in the sysctl table. Because as I see it, you only use allowed_caps_hex in the sysctl table to copy the string to that temporary (variable), and its use is limited to that one function. Instead of: + if ((error = proc_dostring(table, write, buffer, lenp, ppos))) + return error; You could probably get away with something like: > +static int > +nax_dostring(struct ctl_table *table, int write, void *buffer, > + size_t *lenp, loff_t *ppos) > +{ + int error; + char allowed_caps_hex[ALLOWED_CAPS_HEX_LEN + 1]; [...] + len = strlen(allowed_caps_hex); + if (len > ALLOWED_CAPS_HEX_LEN) + return -EINVAL; + strncpy(allowed_caps_hex, buffer, ALLOWED_CAPS_HEX_LEN + 1); I have to admit that having nearly no kernel development experience, I'm not sure there is any guidance on the matter (maybe there is even guidance that recommend preferring what your original patch does to what I suggested) but I think it is unnecessary to have a variable accessible to the whole file but being used to hold (small) temporary data for a single function. In any case, I would appreciate comments for other reviewers if what I'm saying is completely wrong, so as to not mislead you in doing adverse changes. Thanks, Simon
Hi Simon, > Yes, what I meant was that maybe you could just declare it at the beginning of the function, > and not use it at all in the sysctl table. Because as I see it, you only use allowed_caps_hex in the sysctl > table to copy the string to that temporary (variable), and its use is limited to that one function. > > Instead of: > > + if ((error = proc_dostring(table, write, buffer, lenp, ppos))) > + return error; ... > You could probably get away with something like: ... >+ strncpy(allowed_caps_hex, buffer, ALLOWED_CAPS_HEX_LEN + 1); proc_dostring() is more than simple strncpy(). It is handling offsets too. I.e. if a user will try to write not from the starting position. But I've seen that some functions simply create an instance of struct ctl_table, fill it and call needed function. Thanks.
Hi Igor, On 8/12/21 6:43 PM, Igor Zhbanov wrote: > Hi Simon, > > Thanks for thorough review. Will post the corrected version soon. > >>> @@ -278,11 +279,11 @@ endchoice >>> >>> config LSM >>> string "Ordered list of enabled LSMs" >>> - default "landlock,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK >>> - default "landlock,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR >>> - default "landlock,lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO >>> - default "landlock,lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC >>> - default "landlock,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf" >>> + default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK >>> + default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR >>> + default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO >>> + default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC >>> + default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf" >> >> I don't know the policy with regard to new LSMs, but enabling this one by default is maybe a bit violent? >> That said, considering the default mode is SECURITY_NAX_MODE_LOG, this would just log kernel messages and >> not break existing systems, so this shouldn't be a problem. > > I've just oriended on landlock and lockdown. They are handled in the similar > way. But, yes, by default NAX module doesn't block anything. > If you suggest not to do that, I can remove it. If other LSMs are also enabled by default when added to the kernel, and keeping the idea that the default behavior is warning-only (warning in a rate-limited fashion, as you rightfully did, so people running JIT engines as root don't get their kernel log flooded with warnings), then I have no objection to that change. > >>> +__setup("nax_mode=", setup_mode); >>> + >>> +static int __init setup_quiet(char *str) >>> +{ >>> + unsigned long val; >>> + >>> + if (!locked && !kstrtoul(str, 0, &val)) >>> + quiet = val ? 1 : 0; >> >> The order of the kernel parameters will have an impact on NAX behavior. >> >> "nax_mode=1 nax_locked=1" and "nax_locked=1 nax_mode=1" will end up with the same behavior. >> in the first case the mode is enforced, in the second case it isn't (well unless you changed >> the kernel configuration from the default) and it can't be enabled later either. >> >> Is that desired? > > Yes. The idea is that on boot you can set-up the proper options then block > further changing (especially turning the module off). > Initially it was implemented for sysctl parameters, so, you can change > something in init-scripts, then lock. Then, I've extended it to command-line > parameters. > I can ignore "locked" flag in setup_* functions to defer locking to sysctl > parsing. (Unless our command-line is glued by the bootloader from several > parts, so we want to lock it as early as possible...). > I may have badly made my point (especially considering I made a lot of typos when writing that mail, for which I would like to apologize). My sentence "nax_mode=1 nax_locked=1" and "nax_locked=1 nax_mode=1" will end up with the same behavior. lacked the "not" word, and both options will NOT have the same behavior. What I wanted to say was that I think both parameter should have the same behavior, and that the ordering of the options shouldn't impact the end result, because order-dependent options may confuse users. For the matter of have a kernel commandline being the result of concatenations from multiple sources, I think that if any attacker is able to alter part of the command line, they can already write 'lsm=' to it and completely disable NAX, thus I'm not sure 'nax_locked' should impact other setup_* functions. I believe keeping the nax_locked parameter, but not checking for the 'locked' status in the other setup_* functions should be enought, as sysctls writes will still be protected by the 'locked' variable. > Thanks. > Thanks, Simon
Hi Igor, On 8/13/21 10:05 AM, Igor Zhbanov wrote: > Hi Simon, > >> Yes, what I meant was that maybe you could just declare it at the beginning of the function, >> and not use it at all in the sysctl table. Because as I see it, you only use allowed_caps_hex in the sysctl >> table to copy the string to that temporary (variable), and its use is limited to that one function. >> >> Instead of: >> >> + if ((error = proc_dostring(table, write, buffer, lenp, ppos))) >> + return error; > ... >> You could probably get away with something like: > ... >> + strncpy(allowed_caps_hex, buffer, ALLOWED_CAPS_HEX_LEN + 1); > > proc_dostring() is more than simple strncpy(). It is handling offsets too. > I.e. if a user will try to write not from the starting position. But > I've seen that some > functions simply create an instance of struct ctl_table, fill it and > call needed function. Oh you're right, I assumed the sysctls write always had to be written from position zero, but I just learned of 'sysctl_writes_strict': even though by default the kernel forbid writes at another offset than zero or partial writes on sysctl files, users can enable a more permissive behavior like 'SYSCTL_WRITES_LEGACY'. Sorry about that. > > Thanks. > Thanks, Simon
Hi Simon, ср, 28 июл. 2021 г. в 13:19, THOBY Simon <Simon.THOBY@viveris.fr>: > The kernel style guide mandates that all variables are only defined at the beggining of the > function, and not at the beggining of any block like C89. I've checked the kernel coding style and couldn't find anything about variables definition place. Could you, please, point me to the requirement? Scoping variables declaration makes the code better and more secure. Besides, I've checked the kernel sources and see many examples of that. Thank you.
Hi Simon, пт, 13 авг. 2021 г. в 11:08, THOBY Simon <Simon.THOBY@viveris.fr>: > For the matter of have a kernel commandline being the result of concatenations from multiple > sources, I think that if any attacker is able to alter part of the command line, they can > already write 'lsm=' to it and completely disable NAX, thus I'm not sure 'nax_locked' should > impact other setup_* functions. > > I believe keeping the nax_locked parameter, but not checking for the 'locked' status in the other setup_* > functions should be enought, as sysctls writes will still be protected by the 'locked' variable. I thought again about it. Currently it is possible to set parameters value in Kconfig, including "locked". So, if one needed some static configuration, that cannot be altered by any means, they can set the desired values at compilation time in Kconfig and it will be impossible to change it, nor by sysctl, nor by command-line. But if I remove that (!locked) check, then the command-line options would alway be able to override the compile-time configuration, including unlocking the locked state. Thank you.
Hi Igor, On 8/13/21 10:10 PM, Igor Zhbanov wrote: > Hi Simon, > > ср, 28 июл. 2021 г. в 13:19, THOBY Simon <Simon.THOBY@viveris.fr>: >> The kernel style guide mandates that all variables are only defined at the beggining of the >> function, and not at the beggining of any block like C89. > > I've checked the kernel coding style and couldn't find anything about > variables definition > place. Could you, please, point me to the requirement? After trying (without results) to find it in the kernel coding style, it seems I have accepted a maintainer preference as the official style guide without checking: https://lore.kernel.org/linux-integrity/bd785a9d0c029cec34514d306e110bdef945c13e.camel@linux.ibm.com/ (see "Move the variable definition to the beginning of the function.") I'm sorry about that, and for the time you may have lost trying to locate it in the style guide, which doesn't say (as far as I could see) anything about variable declarations. > > Scoping variables declaration makes the code better and more secure. Not arguing, I agree with you on the benefits for maintainability and readability of keeping variable declaration and use localized (which is also why I think 'allowed_caps_hex' would be better as a local variable). > Besides, I've checked the kernel sources and see many examples of that. > > Thank you. > Sorry again, Simon
Hi Igor, On 8/14/21 3:39 PM, Igor Zhbanov wrote: > Hi Simon, > > пт, 13 авг. 2021 г. в 11:08, THOBY Simon <Simon.THOBY@viveris.fr>: >> For the matter of have a kernel commandline being the result of concatenations from multiple >> sources, I think that if any attacker is able to alter part of the command line, they can >> already write 'lsm=' to it and completely disable NAX, thus I'm not sure 'nax_locked' should >> impact other setup_* functions. >> >> I believe keeping the nax_locked parameter, but not checking for the 'locked' status in the other setup_* >> functions should be enought, as sysctls writes will still be protected by the 'locked' variable. > > I thought again about it. Currently it is possible to set parameters > value in Kconfig, including "locked". > So, if one needed some static configuration, that cannot be altered by > any means, they can set > the desired values at compilation time in Kconfig and it will be > impossible to change it, nor by sysctl, > nor by command-line. > > But if I remove that (!locked) check, then the command-line options > would alway be able to override > the compile-time configuration, including unlocking the locked state. That's a fair point, one way would probably be to replace "!locked" by "!IS_ENABLED(CONFIG_SECURITY_NAX_LOCKED)" in the setup_*() functions, keeping the compile-time override while preventing the commandline parameter ordering issue we discussed. However at this point I understand that you may find the current 'locked' usage easier, and this whole discussion is probably nitpicking on my part. > > Thank you. > Thanks, Simon
diff --git a/Documentation/admin-guide/LSM/NAX.rst b/Documentation/admin-guide/LSM/NAX.rst new file mode 100644 index 000000000000..b742f881f3d7 --- /dev/null +++ b/Documentation/admin-guide/LSM/NAX.rst @@ -0,0 +1,48 @@ +======= +NAX LSM +======= + +:Author: Igor Zhbanov + +NAX (No Anonymous Execution) is a Linux Security Module that extends DAC +by making impossible to make anonymous and modified pages executable for +privileged processes. The module intercepts anonymous executable pages +created with mmap() and mprotect() system calls. + +To select it at boot time, specify ``security=nax`` (though this will +disable any other LSM). + +The privileged process is a process for which any of the following is true: + +- ``uid == 0`` +- ``euid == 0`` +- ``suid == 0`` +- ``fsuid == 0`` +- ``cap_effective`` has any capability except of ``kernel.nax.allowed_caps`` +- ``cap_permitted`` has any capability except of ``kernel.nax.allowed_caps`` + +The following sysctl parameters are available: + +* ``kernel.nax.allowed_caps``: + + Hexadecimal number representing allowed capabilities set for the privileged + processes. + +* ``kernel.nax.enforcing``: + + - 0: Only log errors (when enabled by ``kernel.nax.quiet``) + - 1: Forbid unsafe pages mappings (default) + +* ``kernel.nax.locked``: + + - 0: Changing of the module's sysctl parameters is allowed + - 1: Further changing of the module's sysctl parameters is forbidden + + Setting this parameter to ``1`` after initial setup during the system boot + will prevent the module disabling at the later time. + +* ``kernel.nax.quiet``: + + - 0: Log violations (default) + - 1: Be quiet + - 2: Kill the violating process and log diff --git a/Documentation/admin-guide/LSM/index.rst b/Documentation/admin-guide/LSM/index.rst index a6ba95fbaa9f..e9df7fc9a461 100644 --- a/Documentation/admin-guide/LSM/index.rst +++ b/Documentation/admin-guide/LSM/index.rst @@ -42,6 +42,7 @@ subdirectories. apparmor LoadPin + NAX SELinux Smack tomoyo diff --git a/security/Kconfig b/security/Kconfig index 0ced7fd33e4d..771419647ae1 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -239,6 +239,7 @@ source "security/yama/Kconfig" source "security/safesetid/Kconfig" source "security/lockdown/Kconfig" source "security/landlock/Kconfig" +source "security/nax/Kconfig" source "security/integrity/Kconfig" @@ -278,11 +279,11 @@ endchoice config LSM string "Ordered list of enabled LSMs" - default "landlock,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK - default "landlock,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR - default "landlock,lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO - default "landlock,lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC - default "landlock,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf" + default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,smack,selinux,tomoyo,apparmor,bpf" if DEFAULT_SECURITY_SMACK + default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinux,smack,tomoyo,bpf" if DEFAULT_SECURITY_APPARMOR + default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,tomoyo,bpf" if DEFAULT_SECURITY_TOMOYO + default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,bpf" if DEFAULT_SECURITY_DAC + default "nax,landlock,lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor,bpf" help A comma-separated list of LSMs, in initialization order. Any LSMs left off this list will be ignored. This can be diff --git a/security/Makefile b/security/Makefile index 47e432900e24..5c261bbf4659 100644 --- a/security/Makefile +++ b/security/Makefile @@ -14,6 +14,7 @@ subdir-$(CONFIG_SECURITY_SAFESETID) += safesetid subdir-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown subdir-$(CONFIG_BPF_LSM) += bpf subdir-$(CONFIG_SECURITY_LANDLOCK) += landlock +subdir-$(CONFIG_SECURITY_NAX) += nax # always enable default capabilities obj-y += commoncap.o @@ -34,6 +35,7 @@ obj-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown/ obj-$(CONFIG_CGROUPS) += device_cgroup.o obj-$(CONFIG_BPF_LSM) += bpf/ obj-$(CONFIG_SECURITY_LANDLOCK) += landlock/ +obj-$(CONFIG_SECURITY_NAX) += nax/ # Object integrity file lists subdir-$(CONFIG_INTEGRITY) += integrity diff --git a/security/nax/Kconfig b/security/nax/Kconfig new file mode 100644 index 000000000000..60ef0964f00a --- /dev/null +++ b/security/nax/Kconfig @@ -0,0 +1,71 @@ +# SPDX-License-Identifier: GPL-2.0-only +config SECURITY_NAX + bool "NAX support" + depends on SECURITY + default n + help + This selects NAX (No Anonymous Execution), which extends DAC + support with additional system-wide security settings beyond + regular Linux discretionary access controls. Currently available + is restriction to make anonymous and modified pages executable + to privileged processes. Like capabilities, this security module + stacks with other LSMs. Further information can be found in + Documentation/admin-guide/LSM/NAX.rst. + + If you are unsure how to answer this question, answer N. + +config SECURITY_NAX_LOCKED + bool "Lock NAX settings" + depends on SECURITY_NAX + help + If selected, it will not be possible to change enforcing and quiet + settings via sysctl or the kernel command line. If not selected, + it can be enabled at boot with the kernel parameter "nax_locked=1" + or "kernel.nax_locked=1" sysctl (if the settings are not locked). + +config SECURITY_NAX_QUIET + bool "Silence NAX messages" + depends on SECURITY_NAX + help + If selected, NAX will not print violations. If not selected, it can + be enabled at boot with the kernel parameter "nax_quiet=1" or + "kernel.nax_quiet=1" sysctl (if the settings are not locked). + +choice + prompt "NAX violation action mode" + default SECURITY_NAX_MODE_LOG + depends on SECURITY_NAX + help + Select the NAX violation action mode. + + In the default permissive mode the violations are only logged + (if logging is not suppressed). In the enforcing mode the violations + are prohibited. And in the kill mode the process is terminated. + + The value can be changed at boot with the kernel parameter + "nax_mode" (0, 1, 2) or "kernel.nax_mode=" (0, 1, 2) sysctl (if the + settings are not locked). + + config SECURITY_NAX_MODE_LOG + bool "Permissive mode" + help + In this mode violations are only logged (if logging is not + suppressed). + config SECURITY_NAX_MODE_ENFORCING + bool "Enforcing mode" + help + In this mode violations are prohibited and logged (if + logging is not suppressed). + config SECURITY_NAX_MODE_KILL + bool "Kill mode" + help + In this mode the voilating process is terminated. The event + is logged (if logging is not suppressed). +endchoice + +config SECURITY_NAX_MODE + int + depends on SECURITY_NAX + default 0 if SECURITY_NAX_MODE_LOG + default 1 if SECURITY_NAX_MODE_ENFORCING + default 2 if SECURITY_NAX_MODE_KILL diff --git a/security/nax/Makefile b/security/nax/Makefile new file mode 100644 index 000000000000..9c3372210c77 --- /dev/null +++ b/security/nax/Makefile @@ -0,0 +1,4 @@ +# SPDX-License-Identifier: GPL-2.0-only +obj-$(CONFIG_SECURITY_NAX) := nax.o + +nax-y := nax-lsm.o diff --git a/security/nax/nax-lsm.c b/security/nax/nax-lsm.c new file mode 100644 index 000000000000..ef99d9b36a9d --- /dev/null +++ b/security/nax/nax-lsm.c @@ -0,0 +1,344 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (C) 2016-2021 Open Mobile Platform LLC. + * + * Written by: Igor Zhbanov <i.zhbanov@omp.ru, izh1979@gmail.com> + * + * NAX (No Anonymous Execution) Linux Security Module + * This module prevents execution of the code in anonymous or modified pages. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2, as + * published by the Free Software Foundation. */ + +#define pr_fmt(fmt) "NAX: " fmt + +#include <linux/capability.h> +#include <linux/cred.h> +#include <linux/ctype.h> +#include <linux/lsm_hooks.h> +#include <linux/mman.h> +#include <linux/sched.h> +#include <linux/securebits.h> +#include <linux/security.h> +#include <linux/sysctl.h> +#include <linux/uidgid.h> + +#define NAX_MODE_PERMISSIVE 0 /* Log only */ +#define NAX_MODE_ENFORCING 1 /* Enforce and log */ +#define NAX_MODE_KILL 2 /* Kill process and log */ + +static int mode = CONFIG_SECURITY_NAX_MODE, + quiet = IS_ENABLED(CONFIG_SECURITY_NAX_QUIET), + locked = IS_ENABLED(CONFIG_SECURITY_NAX_LOCKED); + +#define ALLOWED_CAPS_HEX_LEN (_KERNEL_CAPABILITY_U32S * 8) + +static char allowed_caps_hex[ALLOWED_CAPS_HEX_LEN + 1]; +static kernel_cap_t allowed_caps = CAP_EMPTY_SET; + +static int +is_privileged_process(void) +{ + const struct cred *cred; + kuid_t root_uid; + + cred = current_cred(); + root_uid = make_kuid(cred->user_ns, 0); + /* We count a process as privileged if it any of its IDs is zero + * or it has any unsafe capability (even in a user namespace) */ + if ((!issecure(SECURE_NOROOT) && (uid_eq(cred->uid, root_uid) || + uid_eq(cred->euid, root_uid) || + uid_eq(cred->suid, root_uid) || + uid_eq(cred->fsuid, root_uid))) || + (!cap_issubset(cred->cap_effective, allowed_caps)) || + (!cap_issubset(cred->cap_permitted, allowed_caps))) + return 1; + + return 0; +} + +static void +log_warn(const char *reason) +{ + if (quiet) + return; + + pr_warn_ratelimited("%s: pid=%d, uid=%u, comm=\"%s\"\n", + reason, current->pid, + from_kuid(&init_user_ns, current_cred()->uid), + current->comm); +} + +static void +kill_current_task(void) +{ + pr_warn("Killing pid=%d, uid=%u, comm=\"%s\"\n", + current->pid, from_kuid(&init_user_ns, current_cred()->uid), + current->comm); + force_sig(SIGKILL); +} + +static int +nax_mmap_file(struct file *file, unsigned long reqprot, + unsigned long prot, unsigned long flags) +{ + int ret = 0; + + if (mode == NAX_MODE_PERMISSIVE && quiet) + return 0; /* Skip further checks in this case */ + + if (!(prot & PROT_EXEC)) /* Not executable memory */ + return 0; + + if (!is_privileged_process()) + return 0; /* Not privileged processes can do anything */ + + if (!file) { /* Anonymous executable memory */ + log_warn("MMAP_ANON_EXEC"); + ret = -EACCES; + } else if (prot & PROT_WRITE) { /* Mapping file RWX */ + log_warn("MMAP_FILE_WRITE_EXEC"); + ret = -EACCES; + } + + if (mode == NAX_MODE_KILL) + kill_current_task(); + + return (mode != NAX_MODE_PERMISSIVE) ? ret : 0; +} + +static int +nax_file_mprotect(struct vm_area_struct *vma, unsigned long reqprot, + unsigned long prot) +{ + if (mode == NAX_MODE_PERMISSIVE && quiet) + return 0; /* Skip further checks in this case */ + + if (!(prot & PROT_EXEC)) /* Not executable memory */ + return 0; + + if (!is_privileged_process()) + return 0; /* Not privileged processes can do anything */ + + if (!(vma->vm_flags & VM_EXEC)) { + int ret = 0; + + if (vma->vm_start >= vma->vm_mm->start_brk && + vma->vm_end <= vma->vm_mm->brk) { + log_warn("MPROTECT_EXEC_HEAP"); + ret = -EACCES; + } else if (!vma->vm_file && + ((vma->vm_start <= vma->vm_mm->start_stack && + vma->vm_end >= vma->vm_mm->start_stack) || + vma_is_stack_for_current(vma))) { + log_warn("MPROTECT_EXEC_STACK"); + ret = -EACCES; + } else if (vma->vm_file && vma->anon_vma) { + /* We are making executable a file mapping that has + * had some COW done. Since pages might have been + * written, check ability to execute the possibly + * modified content. This typically should only + * occur for text relocations. */ + log_warn("MPROTECT_EXEC_MODIFIED"); + ret = -EACCES; + } + + if (ret) { + if (mode == NAX_MODE_KILL) + kill_current_task(); + + return (mode != NAX_MODE_PERMISSIVE) ? ret : 0; + } + } + + return nax_mmap_file(vma->vm_file, reqprot, prot, + vma->vm_flags & VM_SHARED); +} + +static struct security_hook_list nax_hooks[] __lsm_ro_after_init = { + LSM_HOOK_INIT(mmap_file, nax_mmap_file), + LSM_HOOK_INIT(file_mprotect, nax_file_mprotect), +}; + +#ifdef CONFIG_SYSCTL + +static int +nax_dointvec_minmax(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + if (write && (!capable(CAP_SYS_ADMIN) || locked)) + return -EPERM; + + return proc_dointvec_minmax(table, write, buffer, lenp, ppos); +} + +static int +nax_dostring(struct ctl_table *table, int write, void *buffer, + size_t *lenp, loff_t *ppos) +{ + if (write) { + int error; + char *buf = (char *)buffer; + size_t len = *lenp, i; + kernel_cap_t caps = CAP_EMPTY_SET; + + if (!capable(CAP_SYS_ADMIN) || locked) + return -EPERM; + + /* Do not allow trailing garbage or excessive length */ + if (len == ALLOWED_CAPS_HEX_LEN + 1) { + if (buf[--len] != '\n') + return -EINVAL; + } else if (len > ALLOWED_CAPS_HEX_LEN || len <= 0) + return -EINVAL; + + if ((error = proc_dostring(table, write, buffer, lenp, ppos))) + return error; + + len = strlen(allowed_caps_hex); + for (i = 0; i < len; i++) + if (!isxdigit(allowed_caps_hex[i])) + return -EINVAL; + + for (i = 0; i < _KERNEL_CAPABILITY_U32S; i++) { + unsigned long l; + + if (kstrtoul(allowed_caps_hex + + (len >= 8 ? len - 8 : 0), 16, &l)) + return -EINVAL; + + caps.cap[i] = l; + if (len < 8) + break; + + len -= 8; + allowed_caps_hex[len] = '\0'; + } + + allowed_caps = cap_intersect(caps, CAP_FULL_SET); + return 0; + } else { + unsigned i; + + CAP_FOR_EACH_U32(i) + snprintf(allowed_caps_hex + i * 8, 9, "%08x", + allowed_caps.cap[CAP_LAST_U32 - i]); + + return proc_dostring(table, write, buffer, lenp, ppos); + } +} + +struct ctl_path nax_sysctl_path[] = { + { .procname = "kernel" }, + { .procname = "nax" }, + { } +}; + +static int max_mode = NAX_MODE_KILL; + +static struct ctl_table nax_sysctl_table[] = { + { + .procname = "allowed_caps", + .data = allowed_caps_hex, + .maxlen = ALLOWED_CAPS_HEX_LEN + 1, + .mode = 0644, + .proc_handler = nax_dostring, + }, { + .procname = "locked", + .data = &locked, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = nax_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, { + .procname = "mode", + .data = &mode, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = nax_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = &max_mode, + }, { + .procname = "quiet", + .data = &quiet, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = nax_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, + { } +}; + +static void __init +nax_init_sysctl(void) +{ + if (!register_sysctl_paths(nax_sysctl_path, nax_sysctl_table)) + panic("NAX: sysctl registration failed.\n"); +} + +#else /* !CONFIG_SYSCTL */ + +static inline void +nax_init_sysctl(void) +{ + +} + +#endif /* !CONFIG_SYSCTL */ + +static int __init setup_mode(char *str) +{ + unsigned long val; + + if (!locked && !kstrtoul(str, 0, &val)) { + if (val > max_mode){ + pr_err("Invalid 'nax_mode' parameter value (%s)\n", + str); + val = max_mode; + } + + mode = val; + } + + return 1; +} +__setup("nax_mode=", setup_mode); + +static int __init setup_quiet(char *str) +{ + unsigned long val; + + if (!locked && !kstrtoul(str, 0, &val)) + quiet = val ? 1 : 0; + + return 1; +} +__setup("nax_quiet=", setup_quiet); + +static int __init setup_locked(char *str) +{ + unsigned long val; + + if (!locked && !kstrtoul(str, 0, &val)) + locked = val ? 1 : 0; + + return 1; +} +__setup("nax_locked=", setup_locked); + +static __init int +nax_init(void) +{ + pr_info("Starting.\n"); + security_add_hooks(nax_hooks, ARRAY_SIZE(nax_hooks), "nax"); + nax_init_sysctl(); + + return 0; +} + +DEFINE_LSM(nax) = { + .name = "nax", + .init = nax_init, +};
NAX (No Anonymous Execution) is a Linux Security Module that extends DAC by making impossible to make anonymous and modified pages executable for privileged processes. The module intercepts anonymous executable pages created with mmap() and mprotect() system calls. Depending on the settings, the module can block and log violating system calls or log and kill the violating process. Signed-off-by: Igor Zhbanov <i.zhbanov@omp.ru> --- Documentation/admin-guide/LSM/NAX.rst | 48 ++++ Documentation/admin-guide/LSM/index.rst | 1 + security/Kconfig | 11 +- security/Makefile | 2 + security/nax/Kconfig | 71 +++++ security/nax/Makefile | 4 + security/nax/nax-lsm.c | 344 ++++++++++++++++++++++++ 7 files changed, 476 insertions(+), 5 deletions(-) create mode 100644 Documentation/admin-guide/LSM/NAX.rst create mode 100644 security/nax/Kconfig create mode 100644 security/nax/Makefile create mode 100644 security/nax/nax-lsm.c