Message ID | x49wmxuub14.fsf@segfault.boston.devel.redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v4] io_uring: add a sysctl to disable io_uring system-wide | expand |
Jeff Moyer <jmoyer@redhat.com> writes: > From: Matteo Rizzo <matteorizzo@google.com> > > Introduce a new sysctl (io_uring_disabled) which can be either 0, 1, or > 2. When 0 (the default), all processes are allowed to create io_uring > instances, which is the current behavior. When 1, io_uring creation is > disabled (io_uring_setup() will fail with -EPERM) for processes not in > the kernel.io_uring_group group. When 2, calls to io_uring_setup() fail > with -EPERM regardless of privilege. > > Signed-off-by: Matteo Rizzo <matteorizzo@google.com> > [JEM: modified to add io_uring_group] > Signed-off-by: Jeff Moyer <jmoyer@redhat.com> > > --- > v4: > > * Add a kernel.io_uring_group sysctl to hold a group id that is allowed > to use io_uring. One thing worth pointing out is that, when a group > is specified, only users in that group can create an io_uring. That > means that if the root user is not in that group, root can not make > use of io_uring. Rejecting root if it's not in the group doesn't make much sense to me. Of course, root can always just add itself to the group, so it is not a security feature. But I'd expect 'sudo <smth>' to not start giving EPERM based on user group settings. Can you make CAP_SYS_ADMIN always allowed for option 1? > I also wrote unit tests for liburing. I'll post that as well if there > is consensus on this approach. I'm fine with this approach as it allow me to easily reject non-root users.
Gabriel Krisman Bertazi <krisman@suse.de> writes: > Jeff Moyer <jmoyer@redhat.com> writes: > >> From: Matteo Rizzo <matteorizzo@google.com> >> >> Introduce a new sysctl (io_uring_disabled) which can be either 0, 1, or >> 2. When 0 (the default), all processes are allowed to create io_uring >> instances, which is the current behavior. When 1, io_uring creation is >> disabled (io_uring_setup() will fail with -EPERM) for processes not in >> the kernel.io_uring_group group. When 2, calls to io_uring_setup() fail >> with -EPERM regardless of privilege. >> >> Signed-off-by: Matteo Rizzo <matteorizzo@google.com> >> [JEM: modified to add io_uring_group] >> Signed-off-by: Jeff Moyer <jmoyer@redhat.com> >> >> --- >> v4: >> >> * Add a kernel.io_uring_group sysctl to hold a group id that is allowed >> to use io_uring. One thing worth pointing out is that, when a group >> is specified, only users in that group can create an io_uring. That >> means that if the root user is not in that group, root can not make >> use of io_uring. > > Rejecting root if it's not in the group doesn't make much sense to > me. Of course, root can always just add itself to the group, so it is > not a security feature. But I'd expect 'sudo <smth>' to not start giving > EPERM based on user group settings. Can you make CAP_SYS_ADMIN > always allowed for option 1? Yes, that's easy to do. I'd like to gather more opinions on this before changing it, though. >> I also wrote unit tests for liburing. I'll post that as well if there >> is consensus on this approach. > > I'm fine with this approach as it allow me to easily reject non-root users. Thanks for taking a look! -Jeff
On Wed, 16 Aug 2023 at 19:50, Jeff Moyer <jmoyer@redhat.com> wrote: > Matteo, you didn't reply to Jens' message about pulling the patch, so > I figured you got busy, so I picked up the patch. I hope you're okay > with the signoff. Hi, yeah sorry I was on vacation until today so I didn't see the thread. Thanks for picking this up! I would agree with Gabriel that option 1 should also allow processes which have CAP_SYS_ADMIN. -- Matteo
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 3800fab1619b..dc4b19f2f2cb 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -450,6 +450,34 @@ this allows system administrators to override the ``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded. +io_uring_disabled +================= + +Prevents all processes from creating new io_uring instances. Enabling this +shrinks the kernel's attack surface. + += ====================================================================== +0 All processes can create io_uring instances as normal. This is the + default setting. +1 io_uring creation is disabled (io_uring_setup() will fail with -EPERM) + for processes not in the io_uring_group group. Existing io_uring + instances can still be used. See the documentation for io_uring_group + for more information. +2 io_uring creation is disabled for all processes. io_uring_setup() + always fails with -EPERM. Existing io_uring instances can still be + used. += ====================================================================== + + +io_uring_group +============== + +When io_uring_disabled is set to 1, a process must be in the +io_uring_group group in order to create an io_uring instance. If +io_uring_group is set to -1 (the default), only processes with the +CAP_SYS_ADMIN capability may create io_uring instances. + + kexec_load_disabled =================== diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 93db3e4e7b68..fbee37fb9bad 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -152,6 +152,31 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx); struct kmem_cache *req_cachep; +static int __read_mostly sysctl_io_uring_disabled; +static int __read_mostly sysctl_io_uring_group = -1; + +#ifdef CONFIG_SYSCTL +static struct ctl_table kernel_io_uring_disabled_table[] = { + { + .procname = "io_uring_disabled", + .data = &sysctl_io_uring_disabled, + .maxlen = sizeof(sysctl_io_uring_disabled), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_TWO, + }, + { + .procname = "io_uring_group", + .data = &sysctl_io_uring_group, + .maxlen = sizeof(gid_t), + .mode = 0644, + .proc_handler = proc_dointvec, + }, + {}, +}; +#endif + struct sock *io_uring_get_socket(struct file *file) { #if defined(CONFIG_UNIX) @@ -4040,9 +4065,31 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params) return io_uring_create(entries, &p, params); } +static inline bool io_uring_allowed(void) +{ + int disabled = READ_ONCE(sysctl_io_uring_disabled); + kgid_t io_uring_group; + + if (disabled == 0) + return true; + + if (disabled == 2) + return false; + + /* default to root only */ + io_uring_group = make_kgid(&init_user_ns, sysctl_io_uring_group); + if (!gid_valid(io_uring_group)) + return capable(CAP_SYS_ADMIN); + + return in_group_p(io_uring_group); +} + SYSCALL_DEFINE2(io_uring_setup, u32, entries, struct io_uring_params __user *, params) { + if (!io_uring_allowed()) + return -EPERM; + return io_uring_setup(entries, params); } @@ -4617,6 +4664,11 @@ static int __init io_uring_init(void) req_cachep = KMEM_CACHE(io_kiocb, SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT | SLAB_TYPESAFE_BY_RCU); + +#ifdef CONFIG_SYSCTL + register_sysctl_init("kernel", kernel_io_uring_disabled_table); +#endif + return 0; }; __initcall(io_uring_init);