diff mbox series

[v4] io_uring: add a sysctl to disable io_uring system-wide

Message ID x49wmxuub14.fsf@segfault.boston.devel.redhat.com (mailing list archive)
State New
Headers show
Series [v4] io_uring: add a sysctl to disable io_uring system-wide | expand

Commit Message

Jeff Moyer Aug. 16, 2023, 5:55 p.m. UTC
From: Matteo Rizzo <matteorizzo@google.com>

Introduce a new sysctl (io_uring_disabled) which can be either 0, 1, or
2. When 0 (the default), all processes are allowed to create io_uring
instances, which is the current behavior.  When 1, io_uring creation is
disabled (io_uring_setup() will fail with -EPERM) for processes not in
the kernel.io_uring_group group.  When 2, calls to io_uring_setup() fail
with -EPERM regardless of privilege.

Signed-off-by: Matteo Rizzo <matteorizzo@google.com>
[JEM: modified to add io_uring_group]
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>

---
v4:

* Add a kernel.io_uring_group sysctl to hold a group id that is allowed
  to use io_uring.  One thing worth pointing out is that, when a group
  is specified, only users in that group can create an io_uring.  That
  means that if the root user is not in that group, root can not make
  use of io_uring.

  I also wrote unit tests for liburing.  I'll post that as well if there
  is consensus on this approach.

  Matteo, you didn't reply to Jens' message about pulling the patch, so
  I figured you got busy, so I picked up the patch.  I hope you're okay
  with the signoff.

v3:

* Fix the commit message
* Use READ_ONCE in io_uring_allowed to avoid races
* Add reviews

v2:

* Documentation style fixes
* Add a third level that only disables io_uring for unprivileged
  processes

Comments

Gabriel Krisman Bertazi Aug. 16, 2023, 6:10 p.m. UTC | #1
Jeff Moyer <jmoyer@redhat.com> writes:

> From: Matteo Rizzo <matteorizzo@google.com>
>
> Introduce a new sysctl (io_uring_disabled) which can be either 0, 1, or
> 2. When 0 (the default), all processes are allowed to create io_uring
> instances, which is the current behavior.  When 1, io_uring creation is
> disabled (io_uring_setup() will fail with -EPERM) for processes not in
> the kernel.io_uring_group group.  When 2, calls to io_uring_setup() fail
> with -EPERM regardless of privilege.
>
> Signed-off-by: Matteo Rizzo <matteorizzo@google.com>
> [JEM: modified to add io_uring_group]
> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
>
> ---
> v4:
>
> * Add a kernel.io_uring_group sysctl to hold a group id that is allowed
>   to use io_uring.  One thing worth pointing out is that, when a group
>   is specified, only users in that group can create an io_uring.  That
>   means that if the root user is not in that group, root can not make
>   use of io_uring.

Rejecting root if it's not in the group doesn't make much sense to
me. Of course, root can always just add itself to the group, so it is
not a security feature. But I'd expect 'sudo <smth>' to not start giving
EPERM based on user group settings.  Can you make CAP_SYS_ADMIN
always allowed for option 1?

>   I also wrote unit tests for liburing.  I'll post that as well if there
>   is consensus on this approach.

I'm fine with this approach as it allow me to easily reject non-root users.
Jeff Moyer Aug. 16, 2023, 6:21 p.m. UTC | #2
Gabriel Krisman Bertazi <krisman@suse.de> writes:

> Jeff Moyer <jmoyer@redhat.com> writes:
>
>> From: Matteo Rizzo <matteorizzo@google.com>
>>
>> Introduce a new sysctl (io_uring_disabled) which can be either 0, 1, or
>> 2. When 0 (the default), all processes are allowed to create io_uring
>> instances, which is the current behavior.  When 1, io_uring creation is
>> disabled (io_uring_setup() will fail with -EPERM) for processes not in
>> the kernel.io_uring_group group.  When 2, calls to io_uring_setup() fail
>> with -EPERM regardless of privilege.
>>
>> Signed-off-by: Matteo Rizzo <matteorizzo@google.com>
>> [JEM: modified to add io_uring_group]
>> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
>>
>> ---
>> v4:
>>
>> * Add a kernel.io_uring_group sysctl to hold a group id that is allowed
>>   to use io_uring.  One thing worth pointing out is that, when a group
>>   is specified, only users in that group can create an io_uring.  That
>>   means that if the root user is not in that group, root can not make
>>   use of io_uring.
>
> Rejecting root if it's not in the group doesn't make much sense to
> me. Of course, root can always just add itself to the group, so it is
> not a security feature. But I'd expect 'sudo <smth>' to not start giving
> EPERM based on user group settings.  Can you make CAP_SYS_ADMIN
> always allowed for option 1?

Yes, that's easy to do.  I'd like to gather more opinions on this before
changing it, though.

>>   I also wrote unit tests for liburing.  I'll post that as well if there
>>   is consensus on this approach.
>
> I'm fine with this approach as it allow me to easily reject non-root users.

Thanks for taking a look!

-Jeff
Matteo Rizzo Aug. 21, 2023, 12:29 p.m. UTC | #3
On Wed, 16 Aug 2023 at 19:50, Jeff Moyer <jmoyer@redhat.com> wrote:
>   Matteo, you didn't reply to Jens' message about pulling the patch, so
>   I figured you got busy, so I picked up the patch.  I hope you're okay
>   with the signoff.

Hi, yeah sorry I was on vacation until today so I didn't see the thread.
Thanks for picking this up! I would agree with Gabriel that option 1
should also allow processes which have CAP_SYS_ADMIN.

--
Matteo
diff mbox series

Patch

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 3800fab1619b..dc4b19f2f2cb 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -450,6 +450,34 @@  this allows system administrators to override the
 ``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded.
 
 
+io_uring_disabled
+=================
+
+Prevents all processes from creating new io_uring instances. Enabling this
+shrinks the kernel's attack surface.
+
+= ======================================================================
+0 All processes can create io_uring instances as normal. This is the
+  default setting.
+1 io_uring creation is disabled (io_uring_setup() will fail with -EPERM)
+  for processes not in the io_uring_group group.  Existing io_uring
+  instances can still be used.  See the documentation for io_uring_group
+  for more information.
+2 io_uring creation is disabled for all processes. io_uring_setup()
+  always fails with -EPERM. Existing io_uring instances can still be
+  used.
+= ======================================================================
+
+
+io_uring_group
+==============
+
+When io_uring_disabled is set to 1, a process must be in the
+io_uring_group group in order to create an io_uring instance.  If
+io_uring_group is set to -1 (the default), only processes with the
+CAP_SYS_ADMIN capability may create io_uring instances.
+
+
 kexec_load_disabled
 ===================
 
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 93db3e4e7b68..fbee37fb9bad 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -152,6 +152,31 @@  static void __io_submit_flush_completions(struct io_ring_ctx *ctx);
 
 struct kmem_cache *req_cachep;
 
+static int __read_mostly sysctl_io_uring_disabled;
+static int __read_mostly sysctl_io_uring_group = -1;
+
+#ifdef CONFIG_SYSCTL
+static struct ctl_table kernel_io_uring_disabled_table[] = {
+	{
+		.procname	= "io_uring_disabled",
+		.data		= &sysctl_io_uring_disabled,
+		.maxlen		= sizeof(sysctl_io_uring_disabled),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_TWO,
+	},
+	{
+		.procname	= "io_uring_group",
+		.data		= &sysctl_io_uring_group,
+		.maxlen		= sizeof(gid_t),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+	{},
+};
+#endif
+
 struct sock *io_uring_get_socket(struct file *file)
 {
 #if defined(CONFIG_UNIX)
@@ -4040,9 +4065,31 @@  static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
 	return io_uring_create(entries, &p, params);
 }
 
+static inline bool io_uring_allowed(void)
+{
+	int disabled = READ_ONCE(sysctl_io_uring_disabled);
+	kgid_t io_uring_group;
+
+	if (disabled == 0)
+		return true;
+
+	if (disabled == 2)
+		return false;
+
+	/* default to root only */
+	io_uring_group = make_kgid(&init_user_ns, sysctl_io_uring_group);
+	if (!gid_valid(io_uring_group))
+		return capable(CAP_SYS_ADMIN);
+
+	return in_group_p(io_uring_group);
+}
+
 SYSCALL_DEFINE2(io_uring_setup, u32, entries,
 		struct io_uring_params __user *, params)
 {
+	if (!io_uring_allowed())
+		return -EPERM;
+
 	return io_uring_setup(entries, params);
 }
 
@@ -4617,6 +4664,11 @@  static int __init io_uring_init(void)
 
 	req_cachep = KMEM_CACHE(io_kiocb, SLAB_HWCACHE_ALIGN | SLAB_PANIC |
 				SLAB_ACCOUNT | SLAB_TYPESAFE_BY_RCU);
+
+#ifdef CONFIG_SYSCTL
+	register_sysctl_init("kernel", kernel_io_uring_disabled_table);
+#endif
+
 	return 0;
 };
 __initcall(io_uring_init);