From patchwork Wed Apr 9 13:35:19 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 14044649 Received: from mail-il1-f175.google.com (mail-il1-f175.google.com [209.85.166.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B2C60245024 for ; Wed, 9 Apr 2025 13:41:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744206066; cv=none; b=nmOsdk1XsrI4xlcc9ugEyGGxYu7M3QvSY6pa1nM7e9ShAHyUddTHp1N7QaLsDiCtfqwSdfvNaQevPA1kVhD541O8Up5H9n4VSNpySbxrLgFjHvTXNtyaDmhwQvo4dAI2Wa7ICkvG8qL6ATs29NCf19JayLztdgcIv3iIa3GKOOQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744206066; c=relaxed/simple; bh=uMvf7J0yA0zoR+hLE+IY57ndO65Ys2suG5b3TYxY7MY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Whkx7jLQEBB6Wphi/nxMQ4WtODzAqttoLrrYD3AzMF2BX52iEB7DQ7ucXV3Nll4MdZkT9Q1Q3ZsMVO0sruYMV0LRIEaeYmwDXIhHWKJKplSf433ekXg4cwEcyP4YGTxdiAZFFjATgnRqA7dwGIRf5eEv9yZnEvqc0ZcEZG8q5ok= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=bG028Wew; arc=none smtp.client-ip=209.85.166.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="bG028Wew" Received: by mail-il1-f175.google.com with SMTP id e9e14a558f8ab-3d445a722b9so33834785ab.3 for ; Wed, 09 Apr 2025 06:41:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1744206062; x=1744810862; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tXFsFKc6RLbz870fPz68WCkkTLZUZDUmKqEm4G95Sm8=; b=bG028WewEYMgdxx5e5GWYmKSjBOkr+TAVNUTqXDYY79nADAEcdbV/jNyr6fCo6yw3q XK5BUjhRu7tEp1NToTAWUYiBvjghjL10Yo2qiR4w106frdBXCxC5EmCnAR85YRSJn7Oj dk9snHfEq9Hb61n97rg2yjBudONREjtAXiCfdiJgIqD3R3yP7oXVrO+YJs4bPueahqiX 46nJSqdV4cGJL9atC4wD7L5NZrImfFo68OvYR8DHu0A/joSMYw5Tgn7vbILgQswPUCyw U9KatCa7RMoEwvbTCoO2N+hZw+e9k9tgmG7u5LYknWX/eOrfRk2FyrKQY08Hw8tkh5Cz Qmqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744206062; x=1744810862; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tXFsFKc6RLbz870fPz68WCkkTLZUZDUmKqEm4G95Sm8=; b=HBDXQn4JxnUmNU60JMc5LsYqXk4Z5IzddJXJExz2WTmEK62PG4ZffZr3Ed14JUQ2Y0 D/Gpc1oo+hEAUs/WpdOHNBKeS64c1BwRswyXBU3HCF7mllhhDF5pHiY1V1pYmc3jExGR SfuKQhcUVpiJQnli+ocpABj0Vg8qTDWhLLT5LNxiuFMz4/+Mx1lHtYtksOEuINd7B8KI rZ6N5B2f/t5PNX6NpohD5S7l75OTy7oSI7BdGMGPka+XWAJll1XmXXQRsPwyq12AqtVP TtFQYJKNKqxfXytWnAU3eSk9ehk7rdMwR8XJ1tYVXOgGy5UnYVOpHQ1bWil5jeNvGRFz 0bjw== X-Forwarded-Encrypted: i=1; AJvYcCX2ZAgShG9GU0UsKDzgvVKY/VD/uuRWGw/nhdL/GyiG5vxAIXjW678KsyIP98EWsACcT48atkM17yd3NHSj@vger.kernel.org X-Gm-Message-State: AOJu0YyCoiyE+9JpPwH2MmxZeAJ8a5pkDwJ20QbrGPyzPE46+woOSeIX uvKSFsVSAWZqV8dzpBPbnwtppKtmO+TV7M9SrTvfb71xdUk+rEZJGL+AN6A7vx49oNeBa8tjlSA W X-Gm-Gg: ASbGnctkCn6XTTBFnruvCeEfZ+ynosZaKoGoFXTvkbodcoCBUEET3t8ckGeEZdUCeLL eVxInGKSaivJVk9vdqQoAhP6IIiW64jw3BczW/6Mou++HpZq4SgkQ22B/NZzGn374+wMsa85OTz LJPW+ZG3onRK6/mzyRq0x9p+ItBkpRF/wg7C3OtHYKcnfAP0+RJQ9h3uU9bO4qyKWqjjBfw7qX2 8nTFid9hL2EJYZ5JZO9SaYqq8ca7sKsl2re0v0AxxvOtnwoTuMaM2r5EXDI+tFIWmQ1KUFcSRbp W3ETkL4amKDHxa5+LjZR5w3ZBR7tRGjdwqOBQEWDoPFF X-Google-Smtp-Source: AGHT+IGJOKrUGTPRqWp38V120QN16hyhDxK5MbK9+bzXZMD1BHY+Kn7YwvCC+ACxPMhmP7gQlFpW8g== X-Received: by 2002:a05:6e02:1a6a:b0:3d6:d3f7:8826 with SMTP id e9e14a558f8ab-3d7bda45167mr22384555ab.20.1744206061378; Wed, 09 Apr 2025 06:41:01 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4f505e2eaeesm242546173.126.2025.04.09.06.40.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Apr 2025 06:41:00 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: asml.silence@gmail.com, brauner@kernel.org, linux-fsdevel@vger.kernel.org, Jens Axboe Subject: [PATCH 1/5] fs: gate final fput task_work on PF_NO_TASKWORK Date: Wed, 9 Apr 2025 07:35:19 -0600 Message-ID: <20250409134057.198671-2-axboe@kernel.dk> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250409134057.198671-1-axboe@kernel.dk> References: <20250409134057.198671-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 fput currently gates whether or not a task can run task_work on the PF_KTHREAD flag, which excludes kernel threads as they don't usually run task_work as they never exit to userspace. This punts the final fput done from a kthread to a delayed work item instead of using task_work. It's perfectly viable to have the final fput done by the kthread itself, as long as it will actually run the task_work. Add a PF_NO_TASKWORK flag which is set by default by a kernel thread, and gate the task_work fput on that instead. This enables a kernel thread to clear this flag temporarily while putting files, as long as it runs its task_work manually. This enables users like io_uring to ensure that when the final fput of a file is done as part of ring teardown to run the local task_work and hence know that all files have been properly put, without needing to resort to workqueue flushing tricks which can deadlock. No functional changes in this patch. Cc: Christian Brauner Signed-off-by: Jens Axboe Acked-by: Christian Brauner --- fs/file_table.c | 2 +- include/linux/sched.h | 2 +- kernel/fork.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/file_table.c b/fs/file_table.c index c04ed94cdc4b..e3c3dd1b820d 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -521,7 +521,7 @@ static void __fput_deferred(struct file *file) return; } - if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) { + if (likely(!in_interrupt() && !(task->flags & PF_NO_TASKWORK))) { init_task_work(&file->f_task_work, ____fput); if (!task_work_add(task, &file->f_task_work, TWA_RESUME)) return; diff --git a/include/linux/sched.h b/include/linux/sched.h index f96ac1982893..349c993fc32b 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1736,7 +1736,7 @@ extern struct pid *cad_pid; * I am cleaning dirty pages from some other bdi. */ #define PF_KTHREAD 0x00200000 /* I am a kernel thread */ #define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */ -#define PF__HOLE__00800000 0x00800000 +#define PF_NO_TASKWORK 0x00800000 /* task doesn't run task_work */ #define PF__HOLE__01000000 0x01000000 #define PF__HOLE__02000000 0x02000000 #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ diff --git a/kernel/fork.c b/kernel/fork.c index c4b26cd8998b..8dd0b8a5348d 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2261,7 +2261,7 @@ __latent_entropy struct task_struct *copy_process( goto fork_out; p->flags &= ~PF_KTHREAD; if (args->kthread) - p->flags |= PF_KTHREAD; + p->flags |= PF_KTHREAD | PF_NO_TASKWORK; if (args->user_worker) { /* * Mark us a user worker, and block any signal that isn't From patchwork Wed Apr 9 13:35:20 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 14044648 Received: from mail-il1-f175.google.com (mail-il1-f175.google.com [209.85.166.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2AFFB25C6E7 for ; Wed, 9 Apr 2025 13:41:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744206066; cv=none; b=EEn7YMBMC1vv2qY3ck4GbDWNr+7wVbkudHZBpP1JgbzI6g5gFYpxWSF6abgjRVnsfgjLMKD+Euu5wLjDDhrtlktQQi9rYAFOUmxRTdyxoywU6zE5kFA2dji6SZczwwhVY8f+Duzt1ASTSMTVivebRW7W5hNxKwyhFiWlulnCpOE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744206066; c=relaxed/simple; bh=zuxlLmuecWxhF7HszDSq63lAbf31f923lFK+DIkLLjo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UiAkSqMuYuWhFrN10eFNTD5aJXj20CpMu/IL1m6bY6FmIojTjcErj8iVINPkRJMkvdCj5xQ4aojp37oqkYqoRVYyOwxsKYdku82EddvOnboaM9terTlRwV0qpHKYXenuAm82eb87zZO1I/MgMFd3thtm63r/OwI23CW2lPaQtSM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=W5ygwhuc; arc=none smtp.client-ip=209.85.166.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="W5ygwhuc" Received: by mail-il1-f175.google.com with SMTP id e9e14a558f8ab-3ce886a2d5bso65526235ab.1 for ; Wed, 09 Apr 2025 06:41:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1744206063; x=1744810863; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GvkiZRlirJGnWsP0AYjs2Wu7b2S9Zip2/izFiqNb4LM=; b=W5ygwhucnXEU2nF7WrOBCXTq9YWNvCkBaelo7tfD1HEIjyG3qWsK1BQ7yyLiyg+pNo drLef8E9HB4sbOETPZJTxAYHcv1QEavDuIehKEHUST1A6UK4e/Bf56XraB9uoel48p73 oVAo7QNczfhBhDJIhSOwts7X8ZFxOBoNs9uYI5A52D2fphFJhpEFIkC2gvxTe1pJbgGa os2mjpPFFcz8jXvi+fsP/c1yQXcZFBENfJL0DSUq7n+3l+qyMQ4wMSF54Hzb+B9z/Mql ixgu0J9eeeuacMJ6mHRQaFDWL4rM8YafPBjHt3uH3kucfJxHc9YUcxhS6WL26gp+liza 2Q3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744206063; x=1744810863; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GvkiZRlirJGnWsP0AYjs2Wu7b2S9Zip2/izFiqNb4LM=; b=Y3RIyz3vKufcuV+Aj/fZgjQJr+bp6euGsh3mdcYUV2QL6cvmvY96RPhD/GnVWdDboL H/Wgdwev8Efd5dLCwH/SJhXsMJk6/X/9aWvqrumFH80hclKrgWHEFFpbBuRtmPnak5zK zIcY/xdbe724hs86zO6lgYudgUAvGafEAkXyFLSMo6iFq0QSZqYj/1NCLTaLny/EYwGd 65Hwo0ev6hqxdRzIuGW1ZznFIiGLb0gJbf1sxDfB/Hea57Ank146fvOO68mXsGZZ6NXY swM9DFWmLhhoMax0WcwLne4AUfWVYdBcW/JxUmuJEV3tmiseFK0QXaZQYv2888oGd3NS cY0g== X-Forwarded-Encrypted: i=1; AJvYcCVhRMwM7blvd6Aiewi9SkxdjKVSkSMyBylRwdefj/eF029ErhsrAQdh7MC3uoQgw7en+PXZEDwy10b9Q0x+@vger.kernel.org X-Gm-Message-State: AOJu0Ywozzv/4ncuq6Q+wdoIe/6bGlWm/HnHbl2cs0nkUxdNQ0R7uB8v Y3Cj/ILf7zv8IinMNANoTFSgRkH2Ao0/+eEemWruXWEA70cmcxzjKYSBxCNE5tA= X-Gm-Gg: ASbGncs3+DKWRFCeU4wHHok1vlLdvUVqiuALJ8IswTFYdnOax04P7GDknUSKAF9kW5P weyh+hruUYw/aN0rHeufwFWW3Ez0ERgeBvF8tgLcKFQYqEbQcd1hiNEABcnvxQ1sj1swQgwd7eH H6/o8ZJmrUE2ZYxwa9znwxjyHHUOmS+WSaE4pLZYKDN/gmEBOvgbqI1+WhJTqZQHvqxX5j00LbM gIWiw49cL4P/+ddZ0haw6LuaJLzvFQov13kTTaBw1XG3Lwh1KfhSU9Blr4FzaQR1Ya3uyR96Mk8 rTzt2kvP1l3WvGxv7ocaTCl986VGiI2XjH/+UvIyjy6g X-Google-Smtp-Source: AGHT+IF36kiX97IX8oYRAdsiUjrXc1nff3hi3JC3PBUjQfHi0RlevA3PTzyZY9ZvWEsLxoZVfPqn+g== X-Received: by 2002:a05:6e02:3cc3:b0:3d3:e287:3e7a with SMTP id e9e14a558f8ab-3d77c2cc28bmr27919245ab.19.1744206063178; Wed, 09 Apr 2025 06:41:03 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4f505e2eaeesm242546173.126.2025.04.09.06.41.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Apr 2025 06:41:01 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: asml.silence@gmail.com, brauner@kernel.org, linux-fsdevel@vger.kernel.org, Jens Axboe Subject: [PATCH 2/5] io_uring: mark exit side kworkers as task_work capable Date: Wed, 9 Apr 2025 07:35:20 -0600 Message-ID: <20250409134057.198671-3-axboe@kernel.dk> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250409134057.198671-1-axboe@kernel.dk> References: <20250409134057.198671-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 There are two types of work here: 1) Fallback work, if the task is exiting 2) The exit side cancelations and both of them may do the final fput() of a file. When this happens, fput() will schedule delayed work. This slows down exits when io_uring needs to wait for that work to finish. It is possible to flush this via flush_delayed_fput(), but that's a big hammer as other unrelated files could be involved, and from other tasks as well. Add two io_uring helpers to temporarily clear PF_NO_TASKWORK for the worker threads, and run any queued task_work before setting the flag again. Then we can ensure we only flush related items that received their final fput as part of work cancelation and flushing. For now these are io_uring private, but could obviously be made generically available, should there be a need to do so. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index c6209fe44cb1..bff99e185217 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -238,6 +238,20 @@ static inline void io_req_add_to_cache(struct io_kiocb *req, struct io_ring_ctx wq_stack_add_head(&req->comp_list, &ctx->submit_state.free_list); } +static __cold void io_kworker_tw_start(void) +{ + if (WARN_ON_ONCE(!(current->flags & PF_NO_TASKWORK))) + return; + current->flags &= ~PF_NO_TASKWORK; +} + +static __cold void io_kworker_tw_end(void) +{ + while (task_work_pending(current)) + task_work_run(); + current->flags |= PF_NO_TASKWORK; +} + static __cold void io_ring_ctx_ref_free(struct percpu_ref *ref) { struct io_ring_ctx *ctx = container_of(ref, struct io_ring_ctx, refs); @@ -253,6 +267,8 @@ static __cold void io_fallback_req_func(struct work_struct *work) struct io_kiocb *req, *tmp; struct io_tw_state ts = {}; + io_kworker_tw_start(); + percpu_ref_get(&ctx->refs); mutex_lock(&ctx->uring_lock); llist_for_each_entry_safe(req, tmp, node, io_task_work.node) @@ -260,6 +276,7 @@ static __cold void io_fallback_req_func(struct work_struct *work) io_submit_flush_completions(ctx); mutex_unlock(&ctx->uring_lock); percpu_ref_put(&ctx->refs); + io_kworker_tw_end(); } static int io_alloc_hash_table(struct io_hash_table *table, unsigned bits) @@ -2876,6 +2893,8 @@ static __cold void io_ring_exit_work(struct work_struct *work) struct io_tctx_node *node; int ret; + io_kworker_tw_start(); + /* * If we're doing polled IO and end up having requests being * submitted async (out-of-line), then completions can come in while @@ -2932,6 +2951,8 @@ static __cold void io_ring_exit_work(struct work_struct *work) */ } while (!wait_for_completion_interruptible_timeout(&ctx->ref_comp, interval)); + io_kworker_tw_end(); + init_completion(&exit.completion); init_task_work(&exit.task_work, io_tctx_exit_cb); exit.ctx = ctx; From patchwork Wed Apr 9 13:35:21 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 14044650 Received: from mail-io1-f50.google.com (mail-io1-f50.google.com [209.85.166.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 60CF92620D1 for ; Wed, 9 Apr 2025 13:41:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744206067; cv=none; b=hwWRtREa7yMJMpk3jO8Bfr5iXe4yQqs5Xdx/QJuKe0A+suwEvj0dCVEOQKzXVkdN3LpUQiAGQv6Fue4qNv1o5ilPw96crIs5NEycHWEwPJoJiENFjnRgp0rddV54BW5Q71VU1aoye6orbkadWtvAlV4C8VXDytHi83t69OkW5lo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744206067; c=relaxed/simple; bh=3scP3JHTuKX1LZ4+gl8oT4+aiKvSKIdIVZ57eA9iTDc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CzYt9cUUNqrnqSU18AzAH4GEFZ5MAlWHgnAhk7uTaJkQNw7B3VtUqynDwAWP0jf1hU5EH6zVBYgEGH1Kwiln0aA3fFe06bBIDP1PBq/1vlgrw7tsXODd2X6in/RXemlhOBias4SIWpBFcbywoZ0kngYbz6TTUBg9It7KB8bKMfU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=RenqEk3j; arc=none smtp.client-ip=209.85.166.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="RenqEk3j" Received: by mail-io1-f50.google.com with SMTP id ca18e2360f4ac-855bd88ee2cso170042739f.0 for ; Wed, 09 Apr 2025 06:41:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1744206064; x=1744810864; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=U7pPynJaQneGDwKYEd+0PbGH80xVi4jqqdOU/lGU1TE=; b=RenqEk3j0y5ZB3ukjFNp7ZntxRdplKt0NDMahUueFfhMOhQm0nUgL41gzz8J41KJDp qH4vYRnpNh9nybaxnmCQd9TzdmoZNRHJslFVfna/Wk5vhQW6aEwAEpthDttEucT0njlC mdH7P0IX+PSZHo7az94zDQx4q+w9dyO99jy9D468fvOjH6XhJFesC6Krg3anEy2PfrYh gG0WHoQ070bof3d3jTRULwsusG9QVJ8PjgL+8627c7lWwvuZXbsJ5sXg3p6OMVUThVaV Q5oDDYUBpyxJI6xps5OKTADPn+wDwv+BYn8CwhidmJ2r9Xkbrdt1sSTyFRDoODlsIyWw VCXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744206064; x=1744810864; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=U7pPynJaQneGDwKYEd+0PbGH80xVi4jqqdOU/lGU1TE=; b=J6xlycF3UxQJi0o9pSo03NZI8Cn2LrBBPFVAci8sRlx+bYVOpKguftwy20Ru8N4IBF 7k8SCwU8/5PcY8Zc8DZw/7+I90AcqtjAG8s7/x9kXHDPlDfMyIK+FaTX8MJry69a8wVU Jm6FLSkRYqXdTI/zhVeLQLigTAY1kX3hGRaq6WnamtEfinLUEUytHY3Xv5UNOKXUpeCn v2moKnvQWVl8vw4XV0TB0I3dQ2ARgsdgy+KZYBWQBSpuRMafslBbKtDzWSWRy7E5eDoj GdCbd5+qmYaCe/e8PDx91Xtnt6rN7Th2mV//Hrmr3YXT8iHtq8O2DGGXaS/IK+ZAuo4y Fbqw== X-Forwarded-Encrypted: i=1; AJvYcCW/ryPTowodR8tZxiw5x/WquFl92tXVI36/6JxlK/IYq2sTW2KUCk04VkgzHDOwWpRDVQcBCxUrax4sSBWD@vger.kernel.org X-Gm-Message-State: AOJu0YxHHvSypFdpFvCo1tHrkFpzl8cOsxLTjLcW/GXsmdel6CKruixw mZYciOI9VvIVA6I4CtoPFYPmiSrPcG94HTj60FtJc5GBQJJDqKlQ/CcqJv+Qdt8= X-Gm-Gg: ASbGncsU1xYc+qZNLNFv265XycQgMm/MEHM+KxZF/BM2+m4cx/7EhsJyflcCvORsgGV LSOOs+5+ZnHD8OaHE550RdfEODIMqvnSajAr+d2ojCZR1NS964UEVRJkrUiRf+ezQmBEBMG5lQM XyLn4nQIcYu35IuF38tTXYNStTA6d2YQT5+iS3hGsKWDT+w6we4A9A4UOOnJOoTjZazrO7jEari Zi5wvAZkCnDSvjzMlv5b5zqOofX641KmxxXg71ZaeY+V1sAfSo9FBX4cCZf8y0O62LTloVfeBwQ 4rjSafiZ8wU6SD0GwwIdNzCkERy6iLh14ejKMNgDad8c X-Google-Smtp-Source: AGHT+IGc8av/+BjU0ISRwrY4urN6ol/l4Tr3rEkGJdZOwGQ9e/may1bZRc7w9EEl9F0YL9kZtdqp7g== X-Received: by 2002:a05:6602:b97:b0:861:1ba3:cba4 with SMTP id ca18e2360f4ac-86160f544bamr272566239f.0.1744206064440; Wed, 09 Apr 2025 06:41:04 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4f505e2eaeesm242546173.126.2025.04.09.06.41.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Apr 2025 06:41:03 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: asml.silence@gmail.com, brauner@kernel.org, linux-fsdevel@vger.kernel.org, Jens Axboe Subject: [PATCH 3/5] io_uring: consider ring dead once the ref is marked dying Date: Wed, 9 Apr 2025 07:35:21 -0600 Message-ID: <20250409134057.198671-4-axboe@kernel.dk> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250409134057.198671-1-axboe@kernel.dk> References: <20250409134057.198671-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 For queueing work to io-wq or adding normal task_work, io_uring will cancel the work items if the task is going away. If the ring is starting to go through teardown, the ref is marked as dying. Use that as well for the fallback/cancel mechanism. For deferred task_work, this is done out-of-line as part of the exit work handling. Hence it doesn't need any extra checks in the hot path. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index bff99e185217..ce00b616e138 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -555,7 +555,8 @@ static void io_queue_iowq(struct io_kiocb *req) * procedure rather than attempt to run this request (or create a new * worker for it). */ - if (WARN_ON_ONCE(!same_thread_group(tctx->task, current))) + if (WARN_ON_ONCE(!same_thread_group(tctx->task, current) || + percpu_ref_is_dying(&req->ctx->refs))) atomic_or(IO_WQ_WORK_CANCEL, &req->work.flags); trace_io_uring_queue_async_work(req, io_wq_is_hashed(&req->work)); @@ -1246,7 +1247,8 @@ static void io_req_normal_work_add(struct io_kiocb *req) return; } - if (likely(!task_work_add(tctx->task, &tctx->task_work, ctx->notify_method))) + if (!percpu_ref_is_dying(&ctx->refs) && + !task_work_add(tctx->task, &tctx->task_work, ctx->notify_method)) return; io_fallback_tw(tctx, false); From patchwork Wed Apr 9 13:35:22 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 14044651 Received: from mail-io1-f46.google.com (mail-io1-f46.google.com [209.85.166.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A221B2627EC for ; Wed, 9 Apr 2025 13:41:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744206068; cv=none; b=rJsopHtKPew3FhTK5iSdv1jbIGPbUtujhFi0AHpJcBoLSprB3TTumLSHPLH0KvhELGoOYhVsZqZO2HqiRjlOozMtHRNNgWnJiY5/t9JPrzNfoMU84pQ+x4pLfGBJdzxn5zVWKY/weT979idQ72qLWR7SWrdFv5Q3JM/uCY5bOxM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744206068; c=relaxed/simple; bh=27DHst1S12oqLMIhtZ9NLeDleZ1uTNSrSckCS8D3bdg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MD775pKEWGAdKCIJpV6aUvM4HEY1yG8mMi7SccUWegSCmzXL9U+I1p/yu2doH5xFJsKwP5U/hujudYt+uGG7N+kzPsId50zHqieCvMaSh6KOlM+42M/6PdHH6lfoPOVAl1cFSVWj9XCzIEAl611sn8q/p5Eo8b2T5YNVeVIjQws= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=g3mwdPSh; arc=none smtp.client-ip=209.85.166.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="g3mwdPSh" Received: by mail-io1-f46.google.com with SMTP id ca18e2360f4ac-85b3f92c866so126829339f.3 for ; Wed, 09 Apr 2025 06:41:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1744206066; x=1744810866; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1Ez1BnkBDoh4P/XSdyZjO5+p/KT887tz530CwsgerFI=; b=g3mwdPSh3gffxljaZdwwAhV/99lm0jGV7XmCV8AP7OaL3+RJkUHf76Yserd8NTHUAJ Uh51e+tOOBzM/Xlso3o7VxNqNkfKov299R3DCa7gHhAyN/GuzZhpOl7M0C+7PVGDQZZ6 tZXrIJE7P4KCei0CV9gSKVzB0LxavMTWESfKQoyS4j1mw0eYk8gwLvZUsVUU4TptvPZL VKWB5Awd6YhLn+OzhQ2LykKd3gPUGxuMyOUtvi6uISESbISQDV/Yj2lIy/Yim7UkfHCE 0yE8CSJ5yqsH3MixguLWE72yBjzOBrfCVmmHnZh7WVZqoVZbM60UUPYA564M7gvAoOvl OxYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744206066; x=1744810866; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1Ez1BnkBDoh4P/XSdyZjO5+p/KT887tz530CwsgerFI=; b=O7xO7onYMgxW75+Ht7LFM0cmf2FsA7rUDVeV1pLf9t/c/YMXIy3JpHXNDKDi78ju25 lhniU4P+eDyhoGc5SezhbEPCIx7+5AmTEfTPckrN2b7C5F4KhRQ/TWed0nRBoU4zY9lO h/Ndh4e8FfwfkoYOsiKxi/7lWBsXk/hEt4e5DOmr3Z8yImJ+pm0GMPA6GPHaQA+idXgE 4urYaEN8sx+jjhGYkjCfkK2QRrUbAU293r80n14KUwZ+b+uFAIRJ9BtWev4cx13edcSQ zyjesnc9h2xvyhzbJi4Oq9HW42YLt1EJ82YzZyHEBAeMITlK8K2PzZJEbnjjaudiXz9u +TIg== X-Forwarded-Encrypted: i=1; AJvYcCVRaUQMZ37Z7wwrBowDWTpP/GUbmEDC1sn6xW76ozCqjOyUWPymsxm36BR7IMCxXpX0edxza93N6MQu9/V1@vger.kernel.org X-Gm-Message-State: AOJu0Yzgbyp7sFjAGFsvmgePYJzOocyWcU/wC8JtTqxrOCAX704+REzr pabYmnD5P7zCfXGqb61niO9xTxUetAnsw8DjngIF8v4vKqq59E4ehmb5e3jgSA94RdvgYJ1Rm+j f X-Gm-Gg: ASbGncue7ixsjkvRkPBKZUXEE/tGt2jUPGTFibZfxmcyRxjyZ6xGbg0i51Payp+zksn IGM7ANnFakRvA4h1SU4Efl6Tn2EdMCL3fQxWsMNbIDl4ucEM/HpteHKV6jGfdJsE8MZP2kPKjrQ vuZ37eqYD1FyOYhIZuluTbmUTKmaYOaWCEmkLbD36AuqM5hwfB1T6yELIfIYuLPXsa+w7kNtp1Y 81XFAzf52bNTOf0VHTjJIAvG+pcyWUsrGXLk56wOjbvdVmzFZflChTMOgeq081otdy/4QnUBv7Y uav4J0q3vbRQbls4WZ4Yz6md8YKsUlHvhbioOY5sRadjIvm8SWT70j8= X-Google-Smtp-Source: AGHT+IFW2Fyl8Ve7PRrc2IiSrj0fh4NJ12qB3srO6FA/0CYOgdkfnTwoGNdFBJf7eo1f/sG8hS25RA== X-Received: by 2002:a05:6602:4019:b0:85b:46d7:1886 with SMTP id ca18e2360f4ac-86162828f69mr289311339f.7.1744206065750; Wed, 09 Apr 2025 06:41:05 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4f505e2eaeesm242546173.126.2025.04.09.06.41.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Apr 2025 06:41:04 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: asml.silence@gmail.com, brauner@kernel.org, linux-fsdevel@vger.kernel.org, Jens Axboe Subject: [PATCH 4/5] io_uring: wait for cancelations on final ring put Date: Wed, 9 Apr 2025 07:35:22 -0600 Message-ID: <20250409134057.198671-5-axboe@kernel.dk> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250409134057.198671-1-axboe@kernel.dk> References: <20250409134057.198671-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 We still offload the cancelation to a workqueue, as not to introduce dependencies between the exiting task waiting on cleanup, and that task needing to run task_work to complete the process. This means that once the final ring put is done, any request that was inflight and needed cancelation will be done as well. Notably requests that hold references to files - once the ring fd close is done, we will have dropped any of those references too. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 2 ++ io_uring/io_uring.c | 17 +++++++++++++++++ 2 files changed, 19 insertions(+) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index b44d201520d8..4d26aef281fb 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -450,6 +450,8 @@ struct io_ring_ctx { struct io_mapped_region param_region; /* just one zcrx per ring for now, will move to io_zcrx_ifq eventually */ struct io_mapped_region zcrx_region; + + struct completion *exit_comp; }; /* diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index ce00b616e138..4b3e3ff774d6 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2891,6 +2891,7 @@ static __cold void io_ring_exit_work(struct work_struct *work) struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, exit_work); unsigned long timeout = jiffies + HZ * 60 * 5; unsigned long interval = HZ / 20; + struct completion *exit_comp; struct io_tctx_exit exit; struct io_tctx_node *node; int ret; @@ -2955,6 +2956,10 @@ static __cold void io_ring_exit_work(struct work_struct *work) io_kworker_tw_end(); + exit_comp = READ_ONCE(ctx->exit_comp); + if (exit_comp) + complete(exit_comp); + init_completion(&exit.completion); init_task_work(&exit.task_work, io_tctx_exit_cb); exit.ctx = ctx; @@ -3017,9 +3022,21 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx) static int io_uring_release(struct inode *inode, struct file *file) { struct io_ring_ctx *ctx = file->private_data; + DECLARE_COMPLETION_ONSTACK(exit_comp); file->private_data = NULL; + WRITE_ONCE(ctx->exit_comp, &exit_comp); io_ring_ctx_wait_and_kill(ctx); + + /* + * Wait for cancel to run before exiting task + */ + do { + if (current->io_uring) + io_fallback_tw(current->io_uring, false); + cond_resched(); + } while (wait_for_completion_interruptible(&exit_comp)); + return 0; } From patchwork Wed Apr 9 13:35:23 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 14044652 Received: from mail-io1-f47.google.com (mail-io1-f47.google.com [209.85.166.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6BDB5264612 for ; Wed, 9 Apr 2025 13:41:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744206070; cv=none; b=hZdE2XI/WPl22ptUCKYB1d4BR7GFj1fwrKroXspBJ89OMTRnJ6galy1+sHhVPrCypSNMWPXhqWiguzuYJqeCrhXvvgpbJ9tdMw00aFhpQHuk7QI2jtleb+sDSpuXqn0a6S5tsdeT7x1hVmfkWDkiQA5iDlJSDjgq/QgrIk+iAy8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744206070; c=relaxed/simple; bh=bNUswGMrRBywSA9q8tHpCOtLHsm0JdawTIqhKFbCOXA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RYV0gj4ee0VZ3M44ON54gW8edeoYMROQMnBUxRktILIL6BRwZ0/pQSiYE9gUO5lY8+sGajTfMWVYCLWmTH6zFSTOjPIqaffIhK23OJI83ai5NoD28gbCDt1sLK1G9Ta3axhe1sfPaicqPop1/qxVFt5DvpfUp3UolH4mNqOPcBU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=PHAtXMI9; arc=none smtp.client-ip=209.85.166.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="PHAtXMI9" Received: by mail-io1-f47.google.com with SMTP id ca18e2360f4ac-85b41281b50so185186339f.3 for ; Wed, 09 Apr 2025 06:41:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1744206067; x=1744810867; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=P2ddNOKe/CjnvbH2FCKVac1wky9jTUmQl0vfrBQN8UU=; b=PHAtXMI9+5LUiYWwg/KftzwtxEinZ0H0FQYPsTWEFTuZMahwOEyGxNr9P8gN2Pzf9B Tv3ucKoFribONCsjMrU3uFYUwLE+haVFAqOr+va2eHeIRTkUilIcU5KE81xBoZghuhar hNLytURMdwfok2hkEqzuA3IN9cxh62xl3JfFpU1eMtskvgMcwB9ITtSU5+dyyNYjWNyF yBP81Ehd3AY438uoUurOnZNygVz0FUlfky7k9hGOzIhEoKw1LB9D7/FQHH3pwAT7Ix0e xePQKgGlVYuHhlta92s95lh4MER2flndr46lcVXvYLnhypcS6537UQCdRvp0goQaZziq PNTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744206067; x=1744810867; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=P2ddNOKe/CjnvbH2FCKVac1wky9jTUmQl0vfrBQN8UU=; b=tXhdEaQPzO81fOLnSnT23ZXXVO0klx2NuOfdhEU9iVLORPwBQNm28BG1gLk2MshwCI W2N8/R4ECAG23t+K+4fZIZ4YivB/zoPOtAsQU7FyN/cUHyCVpB3xkyFWK0rLQmVMCaHS o0wPxAh/7vq0bi5kfFUcMLiK2BaLQpcyaRIH0K5fk5J2b+Vog7ygfOGByNJTRBMUiJTJ kHbublTBD6W8ZHD5fBMTeNeB9nhe6L9865Njz8zM99Wb+viWYLrXBQqefjZM9oTou6Ad KBPbZQUzLBFZKyuPdevkLsPROhJTddDqAH7tsSFTHdB++dClAp2wVL2jlU03UWIYFbpp ki+w== X-Forwarded-Encrypted: i=1; AJvYcCW6ewKJ97uB9FK+tO1QIXG2d7CH2iCH6vD7t60sYDnniTRdiR8XLT731ztacj1ullzMXI8RjM9V+5OhNL4J@vger.kernel.org X-Gm-Message-State: AOJu0YyQqaBmE7QNvuLp2WeuTNhHgRTZWePh9QbhDk44Ak/z2WI9D4ai bMw5XQlCnCruFlT/LQIj+fyys6Mt7PqLcLVW3L/fjJY82zvFQubJtMSfyb/3Vo4= X-Gm-Gg: ASbGnctTrSRkQLxW4mqDlUH9jpLrxrQxUU5jiTbFsJjg6+RlK2gxn53oSKvjvAq3RIE rebYup1L7wAF0I0j3f6trA5QtOcrYjtr4RAIzcPDqKvR+i45+92PEC2JFZSux5Z6PwO52ju6KsK ZQYX5dqeD6e8/tA9sDHWDzBjyCCC2c6jjNq2ggStLceRV1ZYJPZh4/ZTIptw5BcaDpWa2F/xwtB WIiP6UXwRxw+LaY5Yo0Z7omWxOetyrQ3cupgZ4Gw+0hPtTCJUm3Gp0aJszUHS5jgPJF2DQmTUsg sJQSJK8dhl48cCzJSguIfYBcNb7Y5KmIJ1lICd8CTw7E1ygpZCa7UmU= X-Google-Smtp-Source: AGHT+IEN3JUq4v6oDxM51FqvQxj3GrioM+dxBZerBvqnUOao/pNwkPCyLmECD6Z3/lbRjnvehjTJkg== X-Received: by 2002:a05:6602:358c:b0:855:5e3a:e56b with SMTP id ca18e2360f4ac-86161288497mr343783839f.12.1744206067402; Wed, 09 Apr 2025 06:41:07 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4f505e2eaeesm242546173.126.2025.04.09.06.41.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Apr 2025 06:41:06 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: asml.silence@gmail.com, brauner@kernel.org, linux-fsdevel@vger.kernel.org, Jens Axboe Subject: [PATCH 5/5] io_uring: switch away from percpu refcounts Date: Wed, 9 Apr 2025 07:35:23 -0600 Message-ID: <20250409134057.198671-6-axboe@kernel.dk> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250409134057.198671-1-axboe@kernel.dk> References: <20250409134057.198671-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: linux-fsdevel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 For the common cases, the io_uring ref counts are all batched and hence need not be a percpu reference. This saves some memory on systems, but outside of that, it gets rid of needing a full RCU grace period on tearing down the reference. With io_uring now waiting on cancelations and IO during exit, this slows down the tear down a lot, up to 100x as slow. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 2 +- io_uring/io_uring.c | 47 ++++++++++++---------------------- io_uring/io_uring.h | 3 ++- io_uring/msg_ring.c | 4 +-- io_uring/refs.h | 43 +++++++++++++++++++++++++++++++ io_uring/register.c | 2 +- io_uring/rw.c | 2 +- io_uring/sqpoll.c | 2 +- io_uring/zcrx.c | 4 +-- 9 files changed, 70 insertions(+), 39 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 4d26aef281fb..bcafd7cc8c26 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -256,7 +256,7 @@ struct io_ring_ctx { struct task_struct *submitter_task; struct io_rings *rings; - struct percpu_ref refs; + atomic_long_t refs; clockid_t clockid; enum tk_offsets clock_offset; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 4b3e3ff774d6..8b2f8a081ef6 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -252,13 +252,6 @@ static __cold void io_kworker_tw_end(void) current->flags |= PF_NO_TASKWORK; } -static __cold void io_ring_ctx_ref_free(struct percpu_ref *ref) -{ - struct io_ring_ctx *ctx = container_of(ref, struct io_ring_ctx, refs); - - complete(&ctx->ref_comp); -} - static __cold void io_fallback_req_func(struct work_struct *work) { struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, @@ -269,13 +262,13 @@ static __cold void io_fallback_req_func(struct work_struct *work) io_kworker_tw_start(); - percpu_ref_get(&ctx->refs); + io_ring_ref_get(ctx); mutex_lock(&ctx->uring_lock); llist_for_each_entry_safe(req, tmp, node, io_task_work.node) req->io_task_work.func(req, ts); io_submit_flush_completions(ctx); mutex_unlock(&ctx->uring_lock); - percpu_ref_put(&ctx->refs); + io_ring_ref_put(ctx); io_kworker_tw_end(); } @@ -333,10 +326,8 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) hash_bits = clamp(hash_bits, 1, 8); if (io_alloc_hash_table(&ctx->cancel_table, hash_bits)) goto err; - if (percpu_ref_init(&ctx->refs, io_ring_ctx_ref_free, - 0, GFP_KERNEL)) - goto err; + io_ring_ref_init(ctx); ctx->flags = p->flags; ctx->hybrid_poll_time = LLONG_MAX; atomic_set(&ctx->cq_wait_nr, IO_CQ_WAKE_INIT); @@ -360,7 +351,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) ret |= io_futex_cache_init(ctx); ret |= io_rsrc_cache_init(ctx); if (ret) - goto free_ref; + goto err; init_completion(&ctx->ref_comp); xa_init_flags(&ctx->personalities, XA_FLAGS_ALLOC1); mutex_init(&ctx->uring_lock); @@ -386,9 +377,6 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) mutex_init(&ctx->mmap_lock); return ctx; - -free_ref: - percpu_ref_exit(&ctx->refs); err: io_free_alloc_caches(ctx); kvfree(ctx->cancel_table.hbs); @@ -556,7 +544,7 @@ static void io_queue_iowq(struct io_kiocb *req) * worker for it). */ if (WARN_ON_ONCE(!same_thread_group(tctx->task, current) || - percpu_ref_is_dying(&req->ctx->refs))) + io_ring_ref_is_dying(req->ctx))) atomic_or(IO_WQ_WORK_CANCEL, &req->work.flags); trace_io_uring_queue_async_work(req, io_wq_is_hashed(&req->work)); @@ -991,7 +979,7 @@ __cold bool __io_alloc_req_refill(struct io_ring_ctx *ctx) ret = 1; } - percpu_ref_get_many(&ctx->refs, ret); + io_ring_ref_get_many(ctx, ret); while (ret--) { struct io_kiocb *req = reqs[ret]; @@ -1046,7 +1034,7 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, io_tw_token_t tw) io_submit_flush_completions(ctx); mutex_unlock(&ctx->uring_lock); - percpu_ref_put(&ctx->refs); + io_ring_ref_put(ctx); } /* @@ -1070,7 +1058,7 @@ struct llist_node *io_handle_tw_list(struct llist_node *node, ctx_flush_and_put(ctx, ts); ctx = req->ctx; mutex_lock(&ctx->uring_lock); - percpu_ref_get(&ctx->refs); + io_ring_ref_get(ctx); } INDIRECT_CALL_2(req->io_task_work.func, io_poll_task_func, io_req_rw_complete, @@ -1099,10 +1087,10 @@ static __cold void __io_fallback_tw(struct llist_node *node, bool sync) if (sync && last_ctx != req->ctx) { if (last_ctx) { flush_delayed_work(&last_ctx->fallback_work); - percpu_ref_put(&last_ctx->refs); + io_ring_ref_put(last_ctx); } last_ctx = req->ctx; - percpu_ref_get(&last_ctx->refs); + io_ring_ref_get(last_ctx); } if (llist_add(&req->io_task_work.node, &req->ctx->fallback_llist)) @@ -1111,7 +1099,7 @@ static __cold void __io_fallback_tw(struct llist_node *node, bool sync) if (last_ctx) { flush_delayed_work(&last_ctx->fallback_work); - percpu_ref_put(&last_ctx->refs); + io_ring_ref_put(last_ctx); } } @@ -1247,7 +1235,7 @@ static void io_req_normal_work_add(struct io_kiocb *req) return; } - if (!percpu_ref_is_dying(&ctx->refs) && + if (!io_ring_ref_is_dying(ctx) && !task_work_add(tctx->task, &tctx->task_work, ctx->notify_method)) return; @@ -2736,7 +2724,7 @@ static void io_req_caches_free(struct io_ring_ctx *ctx) nr++; } if (nr) - percpu_ref_put_many(&ctx->refs, nr); + io_ring_ref_put_many(ctx, nr); mutex_unlock(&ctx->uring_lock); } @@ -2770,7 +2758,6 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) if (!(ctx->flags & IORING_SETUP_NO_SQARRAY)) static_branch_dec(&io_key_has_sqarray); - percpu_ref_exit(&ctx->refs); free_uid(ctx->user); io_req_caches_free(ctx); if (ctx->hash_map) @@ -2795,7 +2782,7 @@ static __cold void io_activate_pollwq_cb(struct callback_head *cb) * might've been lost due to loose synchronisation. */ wake_up_all(&ctx->poll_wq); - percpu_ref_put(&ctx->refs); + io_ring_ref_put(ctx); } __cold void io_activate_pollwq(struct io_ring_ctx *ctx) @@ -2813,9 +2800,9 @@ __cold void io_activate_pollwq(struct io_ring_ctx *ctx) * only need to sync with it, which is done by injecting a tw */ init_task_work(&ctx->poll_wq_task_work, io_activate_pollwq_cb); - percpu_ref_get(&ctx->refs); + io_ring_ref_get(ctx); if (task_work_add(ctx->submitter_task, &ctx->poll_wq_task_work, TWA_SIGNAL)) - percpu_ref_put(&ctx->refs); + io_ring_ref_put(ctx); out: spin_unlock(&ctx->completion_lock); } @@ -3002,7 +2989,7 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx) struct creds *creds; mutex_lock(&ctx->uring_lock); - percpu_ref_kill(&ctx->refs); + io_ring_ref_kill(ctx); xa_for_each(&ctx->personalities, index, creds) io_unregister_personality(ctx, index); mutex_unlock(&ctx->uring_lock); diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index e4050b2d0821..f8500221dd82 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -13,6 +13,7 @@ #include "slist.h" #include "filetable.h" #include "opdef.h" +#include "refs.h" #ifndef CREATE_TRACE_POINTS #include @@ -142,7 +143,7 @@ static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx) * Not from an SQE, as those cannot be submitted, but via * updating tagged resources. */ - if (!percpu_ref_is_dying(&ctx->refs)) + if (!io_ring_ref_is_dying(ctx)) lockdep_assert(current == ctx->submitter_task); } #endif diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c index 50a958e9c921..00d6a9ed2431 100644 --- a/io_uring/msg_ring.c +++ b/io_uring/msg_ring.c @@ -83,7 +83,7 @@ static void io_msg_tw_complete(struct io_kiocb *req, io_tw_token_t tw) } if (req) kmem_cache_free(req_cachep, req); - percpu_ref_put(&ctx->refs); + io_ring_ref_put(ctx); } static int io_msg_remote_post(struct io_ring_ctx *ctx, struct io_kiocb *req, @@ -96,7 +96,7 @@ static int io_msg_remote_post(struct io_ring_ctx *ctx, struct io_kiocb *req, req->opcode = IORING_OP_NOP; req->cqe.user_data = user_data; io_req_set_res(req, res, cflags); - percpu_ref_get(&ctx->refs); + io_ring_ref_get(ctx); req->ctx = ctx; req->tctx = NULL; req->io_task_work.func = io_msg_tw_complete; diff --git a/io_uring/refs.h b/io_uring/refs.h index 0d928d87c4ed..9cda88a0a0d5 100644 --- a/io_uring/refs.h +++ b/io_uring/refs.h @@ -59,4 +59,47 @@ static inline void io_req_set_refcount(struct io_kiocb *req) { __io_req_set_refcount(req, 1); } + +#define IO_RING_REF_DEAD (1UL << (BITS_PER_LONG - 1)) +#define IO_RING_REF_MASK (~IO_RING_REF_DEAD) + +static inline bool io_ring_ref_is_dying(struct io_ring_ctx *ctx) +{ + return atomic_long_read(&ctx->refs) & IO_RING_REF_DEAD; +} + +static inline void io_ring_ref_put_many(struct io_ring_ctx *ctx, int nr_refs) +{ + unsigned long refs; + + refs = atomic_long_sub_return(nr_refs, &ctx->refs); + if (!(refs & IO_RING_REF_MASK)) + complete(&ctx->ref_comp); +} + +static inline void io_ring_ref_put(struct io_ring_ctx *ctx) +{ + io_ring_ref_put_many(ctx, 1); +} + +static inline void io_ring_ref_kill(struct io_ring_ctx *ctx) +{ + atomic_long_xor(IO_RING_REF_DEAD, &ctx->refs); + io_ring_ref_put(ctx); +} + +static inline void io_ring_ref_init(struct io_ring_ctx *ctx) +{ + atomic_long_set(&ctx->refs, 1); +} + +static inline void io_ring_ref_get_many(struct io_ring_ctx *ctx, int nr_refs) +{ + atomic_long_add(nr_refs, &ctx->refs); +} + +static inline void io_ring_ref_get(struct io_ring_ctx *ctx) +{ + atomic_long_inc(&ctx->refs); +} #endif diff --git a/io_uring/register.c b/io_uring/register.c index cc23a4c205cd..54fe94a0101b 100644 --- a/io_uring/register.c +++ b/io_uring/register.c @@ -637,7 +637,7 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, * We don't quiesce the refs for register anymore and so it can't be * dying as we're holding a file ref here. */ - if (WARN_ON_ONCE(percpu_ref_is_dying(&ctx->refs))) + if (WARN_ON_ONCE(io_ring_ref_is_dying(ctx))) return -ENXIO; if (ctx->submitter_task && ctx->submitter_task != current) diff --git a/io_uring/rw.c b/io_uring/rw.c index 039e063f7091..e010d548edea 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -496,7 +496,7 @@ static bool io_rw_should_reissue(struct io_kiocb *req) * Don't attempt to reissue from that path, just let it fail with * -EAGAIN. */ - if (percpu_ref_is_dying(&ctx->refs)) + if (io_ring_ref_is_dying(ctx)) return false; io_meta_restore(io, &rw->kiocb); diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c index d037cc68e9d3..b71f8d52386e 100644 --- a/io_uring/sqpoll.c +++ b/io_uring/sqpoll.c @@ -184,7 +184,7 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries) * Don't submit if refs are dying, good for io_uring_register(), * but also it is relied upon by io_ring_exit_work() */ - if (to_submit && likely(!percpu_ref_is_dying(&ctx->refs)) && + if (to_submit && likely(!io_ring_ref_is_dying(ctx)) && !(ctx->flags & IORING_SETUP_R_DISABLED)) ret = io_submit_sqes(ctx, to_submit); mutex_unlock(&ctx->uring_lock); diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c index 0f46e0404c04..e8dbed7b8171 100644 --- a/io_uring/zcrx.c +++ b/io_uring/zcrx.c @@ -630,7 +630,7 @@ static int io_pp_zc_init(struct page_pool *pp) if (pp->p.dma_dir != DMA_FROM_DEVICE) return -EOPNOTSUPP; - percpu_ref_get(&ifq->ctx->refs); + io_ring_ref_get(ifq->ctx); return 0; } @@ -641,7 +641,7 @@ static void io_pp_zc_destroy(struct page_pool *pp) if (WARN_ON_ONCE(area->free_count != area->nia.num_niovs)) return; - percpu_ref_put(&ifq->ctx->refs); + io_ring_ref_put(ifq->ctx); } static int io_pp_nl_fill(void *mp_priv, struct sk_buff *rsp,