From patchwork Fri Nov 22 16:12:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13883325 Received: from mail-oi1-f179.google.com (mail-oi1-f179.google.com [209.85.167.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 335A57080F for ; Fri, 22 Nov 2024 16:16:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732292215; cv=none; b=Z2mLgmpsQgYGVostYfKH281UmuKpmtFkWYsO5PW3zOhYcDhDX0uZiRQoLgoPFoG+3LZO+reOtwSaR3dtS3KQ8In0aXWgqvglGejSR5H8P1t2MfC+nBtBffY3ONSwRK6XH1Eh7hmmLZwtUpNV9a3rJ+/INpiaItBpzhYEGFp38XA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732292215; c=relaxed/simple; bh=aKqf/yet5xbIo75w1/w823xrJ+0QRPqSOoxcTHUrFMw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qK+nh2Zzz+TFIncyPNGPr/GF157Pj98tilxUreJR8tFUEgslcFptOdfERap1gGpSj/+GA4X+0yqNDVOrl/WXc3R+R9fD5mzBTzqAd3o6eBMxC3hDh//RhQbCT9fVdiEzHPkhPxOToa07ImhZbwbZkCnPA2xgRFB3tyA1DL8E+I4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=OqtzTxIQ; arc=none smtp.client-ip=209.85.167.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="OqtzTxIQ" Received: by mail-oi1-f179.google.com with SMTP id 5614622812f47-3e600add5dcso1282063b6e.2 for ; Fri, 22 Nov 2024 08:16:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1732292211; x=1732897011; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ZV2XLAd2FBw7uW27Z17RAX832FSnx9DIDrjhxIjZX38=; b=OqtzTxIQ1xarLr4yTNnsy2PKlDwAbC5in+qPgqlvZ1cJQrroj2gmiZItYzPm9eUqh2 dPh62t3ImuySx/K4Y//3/7TgddIdSfYEZPJVPZDXA1oDAHtEjhEaEg9pY+GTR4F9uei0 UZEsjP83cbmywFkFUerItmxeBykQuMJVqNy0WhIxBI8MJm3Qu3dUfRMmCQR2/4BBONF9 eYt0qu3q5Kk0K8gh14U84TYYXenJCbtJhilGmJ25zAmUV0jfreH4UAs2vboQ3b0rIIFO EvF/SjFtiZALKIaTFFLAHoClZADlbagBvAJ8aRT163VFkH60a/f36CxZhzQcD18/jS4O XEbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732292211; x=1732897011; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZV2XLAd2FBw7uW27Z17RAX832FSnx9DIDrjhxIjZX38=; b=wcoJT9A3kDnlhI41KoGLS0/WbaGJzzsY7bBZNEy0//5UWCu0Y70o6EVvS+ruG9uKa4 ntAW5DRfQOuFBxd+VUFF9lzY+ik8nLEC04Wn7PzYl+/41GvBkEiRY+j9BSnQtwPgPGNi yZBYNGK6XqXjom63B/I9b0FW/XGkxhkejHv7WHjl5bBBfgkUbt71M4M5HbVRFNvHF01p UJKfXLKS/gcLItOER2MVkDOki+YE7KbEmNP6rAwwn0ZxBvZRhkhoTedqNg9mNUiFUSSk laviwe0bn2xo8qfDy437PDhUSNxsfeRcH/pf48eul+JwL1g9nzu7DYot2uGwR4W8lxWO RhAg== X-Gm-Message-State: AOJu0Yz3tnZDEHY4GLID9Pqm942EmRomGWfCmSwBC6Y1GzWg/MC0ughL MikKnAQ4/R0Jxz9S7/ku0WPZETgrcldAKA8dm6MaF44N7sTxn9Zc62uM4IaOf7QSnrPT8PRXdVm 76Lk= X-Gm-Gg: ASbGncumvKp7Ql7o9QEVwGGBxH3crEjKbZftLbzKMAKQ90lEnVSqxepoYq2BrVQg/oT ur6FwKVbk7a7NBGZoc4RursX3+aACWdfHe7BnlfQ8o6IL9CHFwUwEEhj13y5hUOhnnDj305sJvT EtdulAivdQSSTUO2xn3JI6mGjOgI23ObhPLkJQ8061UqvsMZPxd/gEye2nNlk529ycGuruKpf8z 9lBaBKuY4mo1bPDax1ropU2WO930sGHe68yAHGlifxBu9P9A4UXOQ== X-Google-Smtp-Source: AGHT+IGncLXV1+4DOOWCSEag3vTTMGk45R4uA7tIsoK0f6Rs1D4hnu5nCBlB1K3mjc9UpHBfTxJ2Pg== X-Received: by 2002:a05:6808:1446:b0:3e5:e4c8:cd34 with SMTP id 5614622812f47-3e91587772amr4164097b6e.25.1732292210855; Fri, 22 Nov 2024 08:16:50 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 006d021491bc7-5f06976585esm436958eaf.18.2024.11.22.08.16.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Nov 2024 08:16:49 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 1/6] io_uring: make task_work pending check dependent on ring type Date: Fri, 22 Nov 2024 09:12:39 -0700 Message-ID: <20241122161645.494868-2-axboe@kernel.dk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241122161645.494868-1-axboe@kernel.dk> References: <20241122161645.494868-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 There's no need to check for generic task_work for DEFER_TASKRUN, if we have local task_work pending. This avoids dipping into the huge task_struct, if we have normal task_work pending. Signed-off-by: Jens Axboe --- io_uring/io_uring.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 12abee607e4a..214f9f175102 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -354,7 +354,9 @@ static inline bool io_local_work_pending(struct io_ring_ctx *ctx) static inline bool io_task_work_pending(struct io_ring_ctx *ctx) { - return task_work_pending(current) || io_local_work_pending(ctx); + if (ctx->flags & IORING_SETUP_DEFER_TASKRUN && io_local_work_pending(ctx)) + return true; + return task_work_pending(current); } static inline void io_tw_lock(struct io_ring_ctx *ctx, struct io_tw_state *ts) From patchwork Fri Nov 22 16:12:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13883327 Received: from mail-oo1-f53.google.com (mail-oo1-f53.google.com [209.85.161.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B1F3C1DE8BE for ; Fri, 22 Nov 2024 16:16:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732292216; cv=none; b=Nq89eEEnovMTdtSoIn8zKdKubOBDf1GgIV/3YvT3bRYCJuIQz0ASSeBuWBn6jw8Nv+f6MHKuXYltiK90LhiFal4f/pYvhlvIlITpuCmtVhIoY2odkYVR57GoauOHCAPLVRlQo0hCm0EQTDj477iIr7cH6n4hDvYX3ZuryiqQ0fE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732292216; c=relaxed/simple; bh=3dEGIseN8FavbBePoBQAp5qbtFY1WYjsB+MN0Y9em7c=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=d0JsHn7hMzOXNz8iA55PSaSNPaw/lOHG1FEq9hbH65GQHOquPvXJ6vXJzTt2Ml6oggRR8uskFm8uleQaXxXuppUCRCcCNck5d7GfWTsnj01MSh7qpNOQRr95ydSjxUpFQ95S5/8SjbKFuzgZ9W+xfcX//sX6c3RAPHbDjrJGuD4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=eHy0sX5E; arc=none smtp.client-ip=209.85.161.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="eHy0sX5E" Received: by mail-oo1-f53.google.com with SMTP id 006d021491bc7-5ee1e04776cso1251126eaf.0 for ; Fri, 22 Nov 2024 08:16:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1732292212; x=1732897012; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Urhq4JJDr5tcDlZi6G1vBV7eoJeum5ij2plMDVPz9BE=; b=eHy0sX5EmMRzsIJeaxbVxdLWmpon8I+dj4/T5ZRCqG+fTrvetQxhkE7aMzlzcKXMyB xmLt7JgIgV6wOG8/zxxTNV3s5GeJJ3R7W4Hsk/OcCGr6DDfeVW8dLVpSPX15kvMcEeHw mvfpP4GxSqqSAUTwJgHjcDOcxFTSNsBAlgYIOkn+28EdSHLfqzE/APKVUb6n9VfWAzlZ y4WKts9WlZgWCRb2fQLAZZkexQ3bbQUsHwsNFXMYlJUZ418b/5AdWgyQKuljgKXYpySx M6VtZRiCvj7bdDMpjQqYhX7/nEPrH6cxaRkZJ+G1KdoykeuOoyVGAl9wSiysSiWVFTnj 6Tpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732292212; x=1732897012; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Urhq4JJDr5tcDlZi6G1vBV7eoJeum5ij2plMDVPz9BE=; b=MNLXLEI0hXG+2HnHsso7TB7AqrrhXY4ke2HTlnMvMY+l/PZFwofU2yerLzkMfa+Ex+ h6a1Pq8TaRTqymLjEjjan6J8Tr3TjAXWViBjrAdy1zPc+DWPsZOiq7n11kYtZNG7J3uO KYo+y8/DsWdfizrQrA+5K2Ah5zZq4BUnt2lYxV5iEGoZzf7PdKdI7lV0N7U6Cv08CPlh u7hXrQmsSPFXb+BFWzKbz7iW2smxX4LhxBhSZXGw6lPFL3A5jhxOgxYvwRGf9FPUyJ0o lJSaMItmwM1roqCoDdKXRPzZa4wqz4OAV7QK7ne8ru0T3YB2G2S+OcsNgFiAOITzOM+H HhsQ== X-Gm-Message-State: AOJu0YxOhQPee/NNyIQKQHoG09NUo6IFA883VEHHeTq5vDWJt/QFRpCR 6oTKit8ycJpnUZcDEioQ6K3Osks2C1sWondxhijtpRxfujLtO0nKgwB0+hibyN0yi8dXMP5ZnaJ Mfew= X-Gm-Gg: ASbGnctxP4womhpH2Lle8N9/EtHz4Ha9k72hIrPafrZ33zB0j4xSjbX35oO8nPi8HHe d7O06USZO/1gXu5/YHAFO5aFX2ZZ8/pMELfqSxJ4wIFacsOMxwODmXTWRS+gK68rR2Fa5Rq+vYZ XV4I/RAWP2y+sammxTscxNcqzJ+kv5RXxdi1njOoVauBjcTe1Vr03DcmCP8u4e3EwsKyg0QlNe2 T4k12QGXaNdIp504h6hgn0TLZLL7Z12dOK+6ZT94v7JOXY3Qke6Fw== X-Google-Smtp-Source: AGHT+IFbkRffx1ZBz+QDCzwfcYInLiXxJW/zv4ACoYNTZi1Bsjh8oLKqhG94/boMvhqnI6R9U8Y1vw== X-Received: by 2002:a05:6820:3093:b0:5eb:c72e:e29b with SMTP id 006d021491bc7-5f06aa20e0emr3167447eaf.7.1732292212238; Fri, 22 Nov 2024 08:16:52 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 006d021491bc7-5f06976585esm436958eaf.18.2024.11.22.08.16.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Nov 2024 08:16:51 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 2/6] io_uring: replace defer task_work llist with io_wq_work_list Date: Fri, 22 Nov 2024 09:12:40 -0700 Message-ID: <20241122161645.494868-3-axboe@kernel.dk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241122161645.494868-1-axboe@kernel.dk> References: <20241122161645.494868-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add a spinlock for the list, and replace the lockless llist with the work list instead. This avoids needing to reverse items in the list before running them, as the io_wq_work_list is FIFO by nature whereas the llist is LIFO. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 13 ++- io_uring/io_uring.c | 194 ++++++++++++++++----------------- io_uring/io_uring.h | 2 +- 3 files changed, 104 insertions(+), 105 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 011860ade268..e9ba99cb0ed0 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -335,8 +335,9 @@ struct io_ring_ctx { * regularly bounce b/w CPUs. */ struct { - struct llist_head work_llist; - struct llist_head retry_llist; + struct io_wq_work_list work_list; + spinlock_t work_lock; + int work_items; unsigned long check_cq; atomic_t cq_wait_nr; atomic_t cq_timeouts; @@ -566,7 +567,11 @@ enum { typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); struct io_task_work { - struct llist_node node; + /* DEFER_TASKRUN uses work_node, regular task_work node */ + union { + struct io_wq_work_node work_node; + struct llist_node node; + }; io_req_tw_func_t func; }; @@ -622,8 +627,6 @@ struct io_kiocb { */ u16 buf_index; - unsigned nr_tw; - /* REQ_F_* flags */ io_req_flags_t flags; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index c3a7d0197636..b7eb962e9872 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -339,7 +339,8 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) INIT_LIST_HEAD(&ctx->defer_list); INIT_LIST_HEAD(&ctx->timeout_list); INIT_LIST_HEAD(&ctx->ltimeout_list); - init_llist_head(&ctx->work_llist); + INIT_WQ_LIST(&ctx->work_list); + spin_lock_init(&ctx->work_lock); INIT_LIST_HEAD(&ctx->tctx_list); ctx->submit_state.free_list.next = NULL; INIT_HLIST_HEAD(&ctx->waitid_list); @@ -1066,25 +1067,31 @@ struct llist_node *io_handle_tw_list(struct llist_node *node, return node; } -static __cold void __io_fallback_tw(struct llist_node *node, bool sync) +static __cold void __io_fallback_tw(struct io_kiocb *req, bool sync, + struct io_ring_ctx **last_ctx) { + if (sync && *last_ctx != req->ctx) { + if (*last_ctx) { + flush_delayed_work(&(*last_ctx)->fallback_work); + percpu_ref_put(&(*last_ctx)->refs); + } + *last_ctx = req->ctx; + percpu_ref_get(&(*last_ctx)->refs); + } + if (llist_add(&req->io_task_work.node, &req->ctx->fallback_llist)) + schedule_delayed_work(&req->ctx->fallback_work, 1); +} + +static void io_fallback_tw(struct io_uring_task *tctx, bool sync) +{ + struct llist_node *node = llist_del_all(&tctx->task_list); struct io_ring_ctx *last_ctx = NULL; struct io_kiocb *req; while (node) { req = container_of(node, struct io_kiocb, io_task_work.node); node = node->next; - if (sync && last_ctx != req->ctx) { - if (last_ctx) { - flush_delayed_work(&last_ctx->fallback_work); - percpu_ref_put(&last_ctx->refs); - } - last_ctx = req->ctx; - percpu_ref_get(&last_ctx->refs); - } - if (llist_add(&req->io_task_work.node, - &req->ctx->fallback_llist)) - schedule_delayed_work(&req->ctx->fallback_work, 1); + __io_fallback_tw(req, sync, &last_ctx); } if (last_ctx) { @@ -1093,13 +1100,6 @@ static __cold void __io_fallback_tw(struct llist_node *node, bool sync) } } -static void io_fallback_tw(struct io_uring_task *tctx, bool sync) -{ - struct llist_node *node = llist_del_all(&tctx->task_list); - - __io_fallback_tw(node, sync); -} - struct llist_node *tctx_task_work_run(struct io_uring_task *tctx, unsigned int max_entries, unsigned int *count) @@ -1139,65 +1139,45 @@ void tctx_task_work(struct callback_head *cb) static inline void io_req_local_work_add(struct io_kiocb *req, struct io_ring_ctx *ctx, - unsigned flags) + unsigned tw_flags) { - unsigned nr_wait, nr_tw, nr_tw_prev; - struct llist_node *head; + unsigned nr_tw, nr_tw_prev, nr_wait; + unsigned long flags; /* See comment above IO_CQ_WAKE_INIT */ BUILD_BUG_ON(IO_CQ_WAKE_FORCE <= IORING_MAX_CQ_ENTRIES); /* - * We don't know how many reuqests is there in the link and whether - * they can even be queued lazily, fall back to non-lazy. + * We don't know how many requests are in the link and whether they can + * even be queued lazily, fall back to non-lazy. */ if (req->flags & (REQ_F_LINK | REQ_F_HARDLINK)) - flags &= ~IOU_F_TWQ_LAZY_WAKE; + tw_flags &= ~IOU_F_TWQ_LAZY_WAKE; - guard(rcu)(); + spin_lock_irqsave(&ctx->work_lock, flags); + wq_list_add_tail(&req->io_task_work.work_node, &ctx->work_list); + nr_tw_prev = ctx->work_items++; + spin_unlock_irqrestore(&ctx->work_lock, flags); - head = READ_ONCE(ctx->work_llist.first); - do { - nr_tw_prev = 0; - if (head) { - struct io_kiocb *first_req = container_of(head, - struct io_kiocb, - io_task_work.node); - /* - * Might be executed at any moment, rely on - * SLAB_TYPESAFE_BY_RCU to keep it alive. - */ - nr_tw_prev = READ_ONCE(first_req->nr_tw); - } - - /* - * Theoretically, it can overflow, but that's fine as one of - * previous adds should've tried to wake the task. - */ - nr_tw = nr_tw_prev + 1; - if (!(flags & IOU_F_TWQ_LAZY_WAKE)) - nr_tw = IO_CQ_WAKE_FORCE; - - req->nr_tw = nr_tw; - req->io_task_work.node.next = head; - } while (!try_cmpxchg(&ctx->work_llist.first, &head, - &req->io_task_work.node)); - - /* - * cmpxchg implies a full barrier, which pairs with the barrier - * in set_current_state() on the io_cqring_wait() side. It's used - * to ensure that either we see updated ->cq_wait_nr, or waiters - * going to sleep will observe the work added to the list, which - * is similar to the wait/wawke task state sync. - */ + nr_tw = nr_tw_prev + 1; + if (!(tw_flags & IOU_F_TWQ_LAZY_WAKE)) + nr_tw = IO_CQ_WAKE_FORCE; - if (!head) { + if (!nr_tw_prev) { if (ctx->flags & IORING_SETUP_TASKRUN_FLAG) atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags); if (ctx->has_evfd) io_eventfd_signal(ctx); } + /* + * We need a barrier after unlock, which pairs with the barrier + * in set_current_state() on the io_cqring_wait() side. It's used + * to ensure that either we see updated ->cq_wait_nr, or waiters + * going to sleep will observe the work added to the list, which + * is similar to the wait/wake task state sync. + */ + smp_mb(); nr_wait = atomic_read(&ctx->cq_wait_nr); /* not enough or no one is waiting */ if (nr_tw < nr_wait) @@ -1253,11 +1233,27 @@ void io_req_task_work_add_remote(struct io_kiocb *req, struct io_ring_ctx *ctx, static void __cold io_move_task_work_from_local(struct io_ring_ctx *ctx) { - struct llist_node *node = llist_del_all(&ctx->work_llist); + struct io_ring_ctx *last_ctx = NULL; + struct io_wq_work_node *node; + unsigned long flags; - __io_fallback_tw(node, false); - node = llist_del_all(&ctx->retry_llist); - __io_fallback_tw(node, false); + spin_lock_irqsave(&ctx->work_lock, flags); + node = ctx->work_list.first; + INIT_WQ_LIST(&ctx->work_list); + ctx->work_items = 0; + spin_unlock_irqrestore(&ctx->work_lock, flags); + + while (node) { + struct io_kiocb *req; + + req = container_of(node, struct io_kiocb, io_task_work.work_node); + node = node->next; + __io_fallback_tw(req, false, &last_ctx); + } + if (last_ctx) { + flush_delayed_work(&last_ctx->fallback_work); + percpu_ref_put(&last_ctx->refs); + } } static bool io_run_local_work_continue(struct io_ring_ctx *ctx, int events, @@ -1272,52 +1268,52 @@ static bool io_run_local_work_continue(struct io_ring_ctx *ctx, int events, return false; } -static int __io_run_local_work_loop(struct llist_node **node, - struct io_tw_state *ts, - int events) -{ - while (*node) { - struct llist_node *next = (*node)->next; - struct io_kiocb *req = container_of(*node, struct io_kiocb, - io_task_work.node); - INDIRECT_CALL_2(req->io_task_work.func, - io_poll_task_func, io_req_rw_complete, - req, ts); - *node = next; - if (--events <= 0) - break; - } - - return events; -} - static int __io_run_local_work(struct io_ring_ctx *ctx, struct io_tw_state *ts, int min_events) { - struct llist_node *node; + struct io_wq_work_node *node, *tail; + int ret, limit, nitems; unsigned int loops = 0; - int ret, limit; if (WARN_ON_ONCE(ctx->submitter_task != current)) return -EEXIST; if (ctx->flags & IORING_SETUP_TASKRUN_FLAG) atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags); + ret = 0; limit = max(IO_LOCAL_TW_DEFAULT_MAX, min_events); again: - ret = __io_run_local_work_loop(&ctx->retry_llist.first, ts, limit); - if (ctx->retry_llist.first) + spin_lock_irq(&ctx->work_lock); + node = ctx->work_list.first; + tail = ctx->work_list.last; + nitems = ctx->work_items; + INIT_WQ_LIST(&ctx->work_list); + ctx->work_items = 0; + spin_unlock_irq(&ctx->work_lock); + + while (node) { + struct io_kiocb *req = container_of(node, struct io_kiocb, + io_task_work.work_node); + node = node->next; + INDIRECT_CALL_2(req->io_task_work.func, + io_poll_task_func, io_req_rw_complete, + req, ts); + nitems--; + if (++ret >= limit) + break; + } + + if (unlikely(node)) { + spin_lock_irq(&ctx->work_lock); + tail->next = ctx->work_list.first; + ctx->work_list.first = node; + if (!ctx->work_list.last) + ctx->work_list.last = tail; + ctx->work_items += nitems; + spin_unlock_irq(&ctx->work_lock); goto retry_done; + } - /* - * llists are in reverse order, flip it back the right way before - * running the pending items. - */ - node = llist_reverse_order(llist_del_all(&ctx->work_llist)); - ret = __io_run_local_work_loop(&node, ts, ret); - ctx->retry_llist.first = node; loops++; - - ret = limit - ret; if (io_run_local_work_continue(ctx, ret, min_events)) goto again; retry_done: @@ -2413,7 +2409,7 @@ static enum hrtimer_restart io_cqring_min_timer_wakeup(struct hrtimer *timer) if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) { atomic_set(&ctx->cq_wait_nr, 1); smp_mb(); - if (!llist_empty(&ctx->work_llist)) + if (io_local_work_pending(ctx)) goto out_wake; } diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 214f9f175102..2fae27803116 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -349,7 +349,7 @@ static inline int io_run_task_work(void) static inline bool io_local_work_pending(struct io_ring_ctx *ctx) { - return !llist_empty(&ctx->work_llist) || !llist_empty(&ctx->retry_llist); + return READ_ONCE(ctx->work_list.first); } static inline bool io_task_work_pending(struct io_ring_ctx *ctx) From patchwork Fri Nov 22 16:12:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13883328 Received: from mail-oa1-f45.google.com (mail-oa1-f45.google.com [209.85.160.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 073A31DEFFD for ; Fri, 22 Nov 2024 16:16:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732292217; cv=none; b=dEtkl2//RdkjnWcCPL+82+3zut3dcP2mOpQgCuPPVvReqepwzH7gp+MwM9TNyzt142pAtln9E2KHlbbAxzNYvSeDoclpSDb4jeo/z4ThgxCqenFhQLlQN8BJha/bsaT/fq9imR5+rZPy3rr2ZJtA47dc1l5pk/Gw2BpXKWw57DA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732292217; c=relaxed/simple; bh=+DHAUY6acGwCf5k1bOSelonQxNO0aX/NS/jsXA9/eKA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hbQ02X441syfIu0U3snu2MZU2Jh2WcLupKGZVmVeEpPtWv2R3qaT6Z6wc2+osPB0juUr0n+sW2ByvQojK9gbckD/Y99nKsn3nisJ8HTRJwErxu88WK5Q/DV5kJoOmPnvFY/ykyJNDU0D+ymf/fsZboT7QsugIp0b0JHDlhuKwb8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=bO3rs31w; arc=none smtp.client-ip=209.85.160.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="bO3rs31w" Received: by mail-oa1-f45.google.com with SMTP id 586e51a60fabf-2973b53ec02so88028fac.1 for ; Fri, 22 Nov 2024 08:16:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1732292213; x=1732897013; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qefPjpPCJpGhgrdxo9V7RKjuQexAKAK6JHZ2wpLql9I=; b=bO3rs31woBy5kqzrnk8gJ/hYV1NNfmB3QFPJ4K94voKapiuw6I7rbEs9aXx2+carhd RRWpX6V9w2XIjONJb9/HP7L4yZGcirjfWk2wpBDo7ipMv3xEeoPnAipfzf5RYHLaVga/ PITtaQutSL6LSAYLRGikUEDgPUKnTK0y4P0lLWqiPhT7BcPPCjUCUAYjGt5E53TbcLx/ YLFWJmdE2Aty6JU+XqY2YhPeEpmgmWdP+62wbZJN923Srm932kbkxo3Mu+Ma5B/vJztg eMtKTlIXh2xfQ1evM77sRJRiXc8Obn4NoBQORGXnpE9z/ljMa9M5tfDt7kZqaQM++FQt BQZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732292213; x=1732897013; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qefPjpPCJpGhgrdxo9V7RKjuQexAKAK6JHZ2wpLql9I=; b=OwSBjZsCSj/2qTlCo/q20RJccFVAh6PuDC+oUpHauxmUSDw5aaiBBf5VH0ytQs9Jn7 3+Ywaxije5ApErtMD3r/3SM4+f/736zqlJP5eK8cpSQKuhQSnZEEqf5QZf1y3HJtIIlO 8jf/s2Z1WVjCESTnAAMa3uI1Dmxm2dXxoas8vjFEYHpCt5z8fqaubZUqwDiGsYIORaCh BcUFK0liIi8ul0EesKI3ELJ5YDf6WEydGThB08l7tmy+fTwm+u08IGVCLYDlPGYJyika fy9DW1mntwD/1XkfCYjVI8xIlIBR0vh19hwcYjrwpyrwuIqw3q4BIDTiUezUQJKLaJt+ lvWw== X-Gm-Message-State: AOJu0Yz5y1ZLkvGpSJra3q9cuHnhLgQiMYWOncIfBnvG0J724taV5Ahy VgAfTfxYewrXBO4snqBivEGTN5IoMNaL4hKFowdNSwrB+Z2b00OW5rqLfKu92mgfld2BOGFDYho YGT0= X-Gm-Gg: ASbGnctnk5BDANxawV9hNtRlgDbndPruQWhUcQrubSlvIIZxPesbVsBkTTrLSmP8Xdc Ma+7Kxsea2RJMVO/ZJtOmppugncGSo7GfnIbsEshGGfk5hqfXSFHjrkc9GsAkWEabq1kgz+zcKH x4KNCpn0gki5m8e80cafmzkqZwTLcPmRgsGdzGcJBan5l+KEbQqAFjO8sL0zQ/hUA1wK58SftLo sk56hKkxhS6Oy0fpdeHJUmp95hTjd/DOLkgnVYBgy/V8NK1DvVywQ== X-Google-Smtp-Source: AGHT+IFavmrB+zlPYettxpKS7DOPfDiiCUpIz3v+k3zHxlmRJMzAj0T/Eh+b+F08oXrOkwTfciYAFg== X-Received: by 2002:a05:6871:68c:b0:288:c383:788d with SMTP id 586e51a60fabf-29720c37be8mr4067013fac.21.1732292213654; Fri, 22 Nov 2024 08:16:53 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 006d021491bc7-5f06976585esm436958eaf.18.2024.11.22.08.16.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Nov 2024 08:16:52 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 3/6] io_uring/slist: add list-to-list list splice helper Date: Fri, 22 Nov 2024 09:12:41 -0700 Message-ID: <20241122161645.494868-4-axboe@kernel.dk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241122161645.494868-1-axboe@kernel.dk> References: <20241122161645.494868-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add a helper to splice a source list to a destination list. Signed-off-by: Jens Axboe --- io_uring/slist.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/io_uring/slist.h b/io_uring/slist.h index 0eb194817242..7ac7c136b702 100644 --- a/io_uring/slist.h +++ b/io_uring/slist.h @@ -85,6 +85,22 @@ static inline bool wq_list_splice(struct io_wq_work_list *list, return false; } +static inline bool wq_list_splice_list(struct io_wq_work_list *src, + struct io_wq_work_list *dst) +{ + bool ret = false; + + if (wq_list_empty(dst)) { + *dst = *src; + } else { + dst->last->next = src->first; + dst->last = src->last; + ret = true; + } + INIT_WQ_LIST(src); + return false; +} + static inline void wq_stack_add_head(struct io_wq_work_node *node, struct io_wq_work_node *stack) { From patchwork Fri Nov 22 16:12:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13883330 Received: from mail-ot1-f41.google.com (mail-ot1-f41.google.com [209.85.210.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 66D9D1DF26F for ; Fri, 22 Nov 2024 16:16:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732292219; cv=none; b=NHMfDmNzYJi2UGbi6S6iUAXmp0eZV11WDVlwAOc6LfJcIXki0ls8tN76xJy+a6jtE4sl0/UQ9oUzVxNpMh2RSIpB7JoYX6VMInODq5hgrzHs/L10Z754Z0wt0oS0KMzVKJ9D+MrNIrxaG4NBzeLLRo94/XTdxJabWGI+xoIIFIQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732292219; c=relaxed/simple; bh=yPILdr0n5a/b/9EBm7Q1Fd6Lbk+fM6gyG8Vf3pHnTKY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Uf2egEr6rN+4gUwz6SJjvOpEQi1Pd6xbXZtn/tS+aMv2Ww39SrHPzEEhrk12JNgOFpCdg7tkhEov9KieUF5cPJMKp3HBtdxYNztndwiRhdT2pZMi8rfCyGUXm2Xoex7QdHNrMkDG+J7l36fCW8CpAPed4Tz87R1w3bfRaKIGkuk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=L47Mc4gE; arc=none smtp.client-ip=209.85.210.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="L47Mc4gE" Received: by mail-ot1-f41.google.com with SMTP id 46e09a7af769-7180a400238so1209256a34.0 for ; Fri, 22 Nov 2024 08:16:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1732292215; x=1732897015; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wwKTUKk4/aeX7Mm0SYUVKtjnKaUfU61dnF7rxZwYaAs=; b=L47Mc4gEQp+68IyJiDLDpTbpC+xFPffKzOBgc/CrSuKBjqrLLiVP0Gwq8SxRXHeUzF ex8JX0Werr4AE7wXFDnW4IMQPadI8fjc8eCcxVOZAMaAHux+dW7qYQaqwiECwCCW867d y0PdtjG1SAqfJfgoxgfLejBidj9Qb0kqUEJOkqncTURIShFmCshzdqQLgxPCwP038K+h Gd81fuffdU92KNJfufSQaF++ImClQAz2/xR9VcohMrYfWzGD9spbfM2V5IniEG+bbzs6 ZUO6U4A2Mdl3FQ2fCK/qnAapIbRKjgjzvQXXviPkfdXtxRt5Lo3RjNVwaUs9ubK0suJl PFOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732292215; x=1732897015; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wwKTUKk4/aeX7Mm0SYUVKtjnKaUfU61dnF7rxZwYaAs=; b=QLjevgApfKK4WhKSlwTnZ7iJKawNt2if6y+Age9Z7V3ENu65NmOUc/8wBUCDhFpL62 H0EPZ4PemEtl2+qtiL+vU8LuESNQ6Nqn83gqzf8JyNAhZEnzgPQT1zBq898+J6WyYsAt e5y2XzVSOaehFrco1PkisEsXs1DXXkeoyNs9LOyz5PJVSscR1+3/IjUpXt5XtPs2GvZJ SkzxGnPupiKI6Ss+BMINPrMK8QpDDFZLTezFCPiPL5u5w8oSM/LOpeKcpY3N8ViwfN8F wPUlw0kAqC9u0K0LfhThKgSOko3AfxmsX++RX/SFEZC1+zu6eU+6nZQg57lkp+5vgMC7 xuFA== X-Gm-Message-State: AOJu0YxIViwFdAiRffZlRVhFZXc7OcrqXiZK6eIynJuZuQGi3UhaEC7A b5h8hKC18EytQoq1O0hAqGUx4gnsC3b4PjfNNSk0bCfZAk95JdN/UkG1V3ZAlRREBfrBd10zuRy CWkE= X-Gm-Gg: ASbGncusedAONf2I7gobry9aVKNIUtU5amEAbUAEySwJYmbEViJ/3ToDx3Mu2sU1C3v e7ewpPgXn/Q44+pIwwH09NYSjnO5vZWx8VaDA+s02LMQzNhNMfH+4mWCcLfBoqliqARfErX0uwA v/orF+af+wo5W6GAXvXrYoquOb/R1oBVHyJf+aOSEjnyoMSDJT0rZq9ja95iJprSBMCqBu056js C2+eme+oEGkPZ4DP8UQ83ferIs/7ljRk2hJDro2VwKKcT/EHw8nSg== X-Google-Smtp-Source: AGHT+IFkawxGIHMWyQBv3L79mVryWx022uEtQ0eOb20oR0OVb15N0PycLkJRoTVdJSN447mkAoI6RA== X-Received: by 2002:a05:6830:65cd:b0:718:8ce4:6912 with SMTP id 46e09a7af769-71b0e69c35bmr5646508a34.14.1732292214936; Fri, 22 Nov 2024 08:16:54 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 006d021491bc7-5f06976585esm436958eaf.18.2024.11.22.08.16.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Nov 2024 08:16:54 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 4/6] io_uring: switch non-defer task_work to io_wq_work_list Date: Fri, 22 Nov 2024 09:12:42 -0700 Message-ID: <20241122161645.494868-5-axboe@kernel.dk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241122161645.494868-1-axboe@kernel.dk> References: <20241122161645.494868-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Switch the normal task_work to io_wq_work_list as well, to both unify with defer task_work, but also to avoid needing to reverse the ordering of the list when running it. Note that this still keeps the manual retry list for SQPOLL task_work. That could go away as well, as now the task_work list is fully ordered and SQPOLL could just leave entries on there when it chops up the running of the list. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 14 ++- io_uring/io_uring.c | 167 ++++++++++++++++++++------------- io_uring/io_uring.h | 6 +- io_uring/sqpoll.c | 8 +- io_uring/tctx.c | 3 +- 5 files changed, 116 insertions(+), 82 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index e9ba99cb0ed0..7ddac4d1d4b3 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -102,7 +102,8 @@ struct io_uring_task { struct percpu_counter inflight; struct { /* task_work */ - struct llist_head task_list; + struct io_wq_work_list task_list; + spinlock_t task_lock; struct callback_head task_work; } ____cacheline_aligned_in_smp; }; @@ -390,8 +391,9 @@ struct io_ring_ctx { struct mm_struct *mm_account; /* ctx exit and cancelation */ - struct llist_head fallback_llist; - struct delayed_work fallback_work; + struct io_wq_work_list fallback_list; + spinlock_t fallback_lock; + struct work_struct fallback_work; struct work_struct exit_work; struct list_head tctx_list; struct completion ref_comp; @@ -567,11 +569,7 @@ enum { typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts); struct io_task_work { - /* DEFER_TASKRUN uses work_node, regular task_work node */ - union { - struct io_wq_work_node work_node; - struct llist_node node; - }; + struct io_wq_work_node node; io_req_tw_func_t func; }; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index b7eb962e9872..3bb93c77ac3f 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -245,15 +245,26 @@ static __cold void io_ring_ctx_ref_free(struct percpu_ref *ref) static __cold void io_fallback_req_func(struct work_struct *work) { struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, - fallback_work.work); - struct llist_node *node = llist_del_all(&ctx->fallback_llist); - struct io_kiocb *req, *tmp; + fallback_work); + struct io_wq_work_node *node; struct io_tw_state ts = {}; + struct io_wq_work_list list; + + spin_lock_irq(&ctx->fallback_lock); + list = ctx->fallback_list; + INIT_WQ_LIST(&ctx->fallback_list); + spin_unlock_irq(&ctx->fallback_lock); percpu_ref_get(&ctx->refs); mutex_lock(&ctx->uring_lock); - llist_for_each_entry_safe(req, tmp, node, io_task_work.node) + node = list.first; + while (node) { + struct io_kiocb *req; + + req = container_of(node, struct io_kiocb, io_task_work.node); + node = node->next; req->io_task_work.func(req, &ts); + } io_submit_flush_completions(ctx); mutex_unlock(&ctx->uring_lock); percpu_ref_put(&ctx->refs); @@ -347,7 +358,9 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) #ifdef CONFIG_FUTEX INIT_HLIST_HEAD(&ctx->futex_list); #endif - INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func); + INIT_WORK(&ctx->fallback_work, io_fallback_req_func); + INIT_WQ_LIST(&ctx->fallback_list); + spin_lock_init(&ctx->fallback_lock); INIT_WQ_LIST(&ctx->submit_state.compl_reqs); INIT_HLIST_HEAD(&ctx->cancelable_uring_cmd); io_napi_init(ctx); @@ -1033,15 +1046,15 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, struct io_tw_state *ts) * If more entries than max_entries are available, stop processing once this * is reached and return the rest of the list. */ -struct llist_node *io_handle_tw_list(struct llist_node *node, - unsigned int *count, - unsigned int max_entries) +struct io_wq_work_node *io_handle_tw_list(struct io_wq_work_node *node, + unsigned int *count, + unsigned int max_entries) { struct io_ring_ctx *ctx = NULL; struct io_tw_state ts = { }; do { - struct llist_node *next = node->next; + struct io_wq_work_node *next = node->next; struct io_kiocb *req = container_of(node, struct io_kiocb, io_task_work.node); @@ -1067,55 +1080,84 @@ struct llist_node *io_handle_tw_list(struct llist_node *node, return node; } -static __cold void __io_fallback_tw(struct io_kiocb *req, bool sync, - struct io_ring_ctx **last_ctx) +static __cold void __io_fallback_schedule(struct io_ring_ctx *ctx, + struct io_wq_work_list *list, + bool sync) { - if (sync && *last_ctx != req->ctx) { - if (*last_ctx) { - flush_delayed_work(&(*last_ctx)->fallback_work); - percpu_ref_put(&(*last_ctx)->refs); - } - *last_ctx = req->ctx; - percpu_ref_get(&(*last_ctx)->refs); - } - if (llist_add(&req->io_task_work.node, &req->ctx->fallback_llist)) - schedule_delayed_work(&req->ctx->fallback_work, 1); + bool kick_work = true; + unsigned long flags; + + spin_lock_irqsave(&ctx->fallback_lock, flags); + kick_work = !wq_list_splice_list(list, &ctx->fallback_list); + spin_unlock_irqrestore(&ctx->fallback_lock, flags); + if (kick_work) + schedule_work(&ctx->fallback_work); + + if (sync) + flush_work(&ctx->fallback_work); + percpu_ref_put(&ctx->refs); } -static void io_fallback_tw(struct io_uring_task *tctx, bool sync) +static void __io_fallback_tw(struct io_wq_work_list *list, spinlock_t *lock, + bool sync) { - struct llist_node *node = llist_del_all(&tctx->task_list); + struct io_wq_work_list local_list, ctx_list; struct io_ring_ctx *last_ctx = NULL; + struct io_wq_work_node *node; struct io_kiocb *req; + unsigned long flags; + + spin_lock_irqsave(lock, flags); + local_list = *list; + INIT_WQ_LIST(list); + spin_unlock_irqrestore(lock, flags); + INIT_WQ_LIST(&ctx_list); + node = local_list.first; while (node) { + struct io_wq_work_node *next = node->next; + req = container_of(node, struct io_kiocb, io_task_work.node); - node = node->next; - __io_fallback_tw(req, sync, &last_ctx); + if (last_ctx != req->ctx) { + if (last_ctx) + __io_fallback_schedule(last_ctx, &ctx_list, sync); + last_ctx = req->ctx; + percpu_ref_get(&last_ctx->refs); + } + wq_list_add_tail(node, &ctx_list); + node = next; } - if (last_ctx) { - flush_delayed_work(&last_ctx->fallback_work); - percpu_ref_put(&last_ctx->refs); - } + if (last_ctx) + __io_fallback_schedule(last_ctx, &ctx_list, sync); +} + +static void io_fallback_tw(struct io_uring_task *tctx, bool sync) +{ + __io_fallback_tw(&tctx->task_list, &tctx->task_lock, sync); } -struct llist_node *tctx_task_work_run(struct io_uring_task *tctx, - unsigned int max_entries, - unsigned int *count) +struct io_wq_work_node *tctx_task_work_run(struct io_uring_task *tctx, + unsigned int max_entries, + unsigned int *count) { - struct llist_node *node; + struct io_wq_work_node *node; if (unlikely(current->flags & PF_EXITING)) { io_fallback_tw(tctx, true); return NULL; } - node = llist_del_all(&tctx->task_list); - if (node) { - node = llist_reverse_order(node); + if (!READ_ONCE(tctx->task_list.first)) + return NULL; + + spin_lock_irq(&tctx->task_lock); + node = tctx->task_list.first; + INIT_WQ_LIST(&tctx->task_list); + spin_unlock_irq(&tctx->task_lock); + + if (node) node = io_handle_tw_list(node, count, max_entries); - } /* relaxed read is enough as only the task itself sets ->in_cancel */ if (unlikely(atomic_read(&tctx->in_cancel))) @@ -1128,13 +1170,11 @@ struct llist_node *tctx_task_work_run(struct io_uring_task *tctx, void tctx_task_work(struct callback_head *cb) { struct io_uring_task *tctx; - struct llist_node *ret; unsigned int count = 0; tctx = container_of(cb, struct io_uring_task, task_work); - ret = tctx_task_work_run(tctx, UINT_MAX, &count); - /* can't happen */ - WARN_ON_ONCE(ret); + if (tctx_task_work_run(tctx, UINT_MAX, &count)) + WARN_ON_ONCE(1); } static inline void io_req_local_work_add(struct io_kiocb *req, @@ -1155,7 +1195,7 @@ static inline void io_req_local_work_add(struct io_kiocb *req, tw_flags &= ~IOU_F_TWQ_LAZY_WAKE; spin_lock_irqsave(&ctx->work_lock, flags); - wq_list_add_tail(&req->io_task_work.work_node, &ctx->work_list); + wq_list_add_tail(&req->io_task_work.node, &ctx->work_list); nr_tw_prev = ctx->work_items++; spin_unlock_irqrestore(&ctx->work_lock, flags); @@ -1192,9 +1232,16 @@ static void io_req_normal_work_add(struct io_kiocb *req) { struct io_uring_task *tctx = req->tctx; struct io_ring_ctx *ctx = req->ctx; + unsigned long flags; + bool was_empty; + + spin_lock_irqsave(&tctx->task_lock, flags); + was_empty = tctx->task_list.first == NULL; + wq_list_add_tail(&req->io_task_work.node, &tctx->task_list); + spin_unlock_irqrestore(&tctx->task_lock, flags); /* task_work already pending, we're done */ - if (!llist_add(&req->io_task_work.node, &tctx->task_list)) + if (!was_empty) return; if (ctx->flags & IORING_SETUP_TASKRUN_FLAG) @@ -1233,27 +1280,13 @@ void io_req_task_work_add_remote(struct io_kiocb *req, struct io_ring_ctx *ctx, static void __cold io_move_task_work_from_local(struct io_ring_ctx *ctx) { - struct io_ring_ctx *last_ctx = NULL; - struct io_wq_work_node *node; - unsigned long flags; - - spin_lock_irqsave(&ctx->work_lock, flags); - node = ctx->work_list.first; - INIT_WQ_LIST(&ctx->work_list); - ctx->work_items = 0; - spin_unlock_irqrestore(&ctx->work_lock, flags); - - while (node) { - struct io_kiocb *req; - - req = container_of(node, struct io_kiocb, io_task_work.work_node); - node = node->next; - __io_fallback_tw(req, false, &last_ctx); - } - if (last_ctx) { - flush_delayed_work(&last_ctx->fallback_work); - percpu_ref_put(&last_ctx->refs); - } + /* + * __io_fallback_tw() handles lists that can have multiple + * rings in it, which isn't the case here. But it'll work just + * fine, so use it anyway rather than have a special case for + * just a single ctx. + */ + __io_fallback_tw(&ctx->work_list, &ctx->work_lock, false); } static bool io_run_local_work_continue(struct io_ring_ctx *ctx, int events, @@ -1292,7 +1325,7 @@ static int __io_run_local_work(struct io_ring_ctx *ctx, struct io_tw_state *ts, while (node) { struct io_kiocb *req = container_of(node, struct io_kiocb, - io_task_work.work_node); + io_task_work.node); node = node->next; INDIRECT_CALL_2(req->io_task_work.func, io_poll_task_func, io_req_rw_complete, @@ -2967,7 +3000,7 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx) io_unregister_personality(ctx, index); mutex_unlock(&ctx->uring_lock); - flush_delayed_work(&ctx->fallback_work); + flush_work(&ctx->fallback_work); INIT_WORK(&ctx->exit_work, io_ring_exit_work); /* @@ -3106,7 +3139,7 @@ static __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx, if (tctx) ret |= io_run_task_work() > 0; else - ret |= flush_delayed_work(&ctx->fallback_work); + ret |= flush_work(&ctx->fallback_work); return ret; } diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 2fae27803116..0b5181b128aa 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -91,8 +91,10 @@ void io_req_task_queue(struct io_kiocb *req); void io_req_task_complete(struct io_kiocb *req, struct io_tw_state *ts); void io_req_task_queue_fail(struct io_kiocb *req, int ret); void io_req_task_submit(struct io_kiocb *req, struct io_tw_state *ts); -struct llist_node *io_handle_tw_list(struct llist_node *node, unsigned int *count, unsigned int max_entries); -struct llist_node *tctx_task_work_run(struct io_uring_task *tctx, unsigned int max_entries, unsigned int *count); +struct io_wq_work_node *io_handle_tw_list(struct io_wq_work_node *node, + unsigned int *count, unsigned int max_entries); +struct io_wq_work_node *tctx_task_work_run(struct io_uring_task *tctx, + unsigned int max_entries, unsigned int *count); void tctx_task_work(struct callback_head *cb); __cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd); int io_uring_alloc_task_context(struct task_struct *task, diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c index 6df5e649c413..615707260f25 100644 --- a/io_uring/sqpoll.c +++ b/io_uring/sqpoll.c @@ -221,7 +221,7 @@ static bool io_sqd_handle_event(struct io_sq_data *sqd) * than we were asked to process. Newly queued task_work isn't run until the * retry list has been fully processed. */ -static unsigned int io_sq_tw(struct llist_node **retry_list, int max_entries) +static unsigned int io_sq_tw(struct io_wq_work_node **retry_list, int max_entries) { struct io_uring_task *tctx = current->io_uring; unsigned int count = 0; @@ -239,11 +239,11 @@ static unsigned int io_sq_tw(struct llist_node **retry_list, int max_entries) return count; } -static bool io_sq_tw_pending(struct llist_node *retry_list) +static bool io_sq_tw_pending(struct io_wq_work_node *retry_list) { struct io_uring_task *tctx = current->io_uring; - return retry_list || !llist_empty(&tctx->task_list); + return retry_list || READ_ONCE(tctx->task_list.first); } static void io_sq_update_worktime(struct io_sq_data *sqd, struct rusage *start) @@ -259,7 +259,7 @@ static void io_sq_update_worktime(struct io_sq_data *sqd, struct rusage *start) static int io_sq_thread(void *data) { - struct llist_node *retry_list = NULL; + struct io_wq_work_node *retry_list = NULL; struct io_sq_data *sqd = data; struct io_ring_ctx *ctx; struct rusage start; diff --git a/io_uring/tctx.c b/io_uring/tctx.c index 503f3ff8bc4f..7155b3c56c85 100644 --- a/io_uring/tctx.c +++ b/io_uring/tctx.c @@ -87,7 +87,8 @@ __cold int io_uring_alloc_task_context(struct task_struct *task, atomic_set(&tctx->in_cancel, 0); atomic_set(&tctx->inflight_tracked, 0); task->io_uring = tctx; - init_llist_head(&tctx->task_list); + INIT_WQ_LIST(&tctx->task_list); + spin_lock_init(&tctx->task_lock); init_task_work(&tctx->task_work, tctx_task_work); return 0; } From patchwork Fri Nov 22 16:12:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13883329 Received: from mail-ot1-f50.google.com (mail-ot1-f50.google.com [209.85.210.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 877C11DF738 for ; Fri, 22 Nov 2024 16:16:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732292219; cv=none; b=XoiJ4MJvE0iPJXT1zM4ttfnBVDcRgUh9poCNw5oZ5e+o/gyQ43dgZ34FnibNcf+9WFUQJIpmyyiqdFTquX199BpDKL99CBnusArLP0JYiorlVblTHEDBSaAOMQ06Q/ZdBItK5eSf56r3wQgQ2bSl8EfoZl6HHxCRNFJRNlY6jPI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732292219; c=relaxed/simple; bh=xF3F0FOjBfPlxMd+dGq+wT4hyDi9iVYRTYBV+kH3y7I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tV8ErlcEzQx+rGtcddjDAsy+EL7PVvDNwjdIvRyAlSsdLKlOuX5jnQeGKGd+f3l4jkOFMwuz/dsXrhfJIzjTuLnF7cOcpamm+pAkUNzKVHsp5dhkstjaNgXUka6CzD28uMaWq8jC1UWHX2MOpEWgawboW7bIovBn4eZAkHNwLDE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=JwUEncMp; arc=none smtp.client-ip=209.85.210.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="JwUEncMp" Received: by mail-ot1-f50.google.com with SMTP id 46e09a7af769-71a6c05dc10so1658901a34.1 for ; Fri, 22 Nov 2024 08:16:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1732292216; x=1732897016; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=aZyZMmV9iblLjdCcfE3CRt2M+y2t90QN+C7X66Z7Ik8=; b=JwUEncMpaXFmEqa2rP0g/KhR1SfXL51xTB9+k4y8I6b+sh7RZDU9WRX/B2c6IQVWNJ 8Uh2dV/EeUV4rTnh3u/vkA8CXUN23RpTsXVJtA5C+HMwPbO1e3esyhCdcbBC3GiGefAR qPqSCXq3WQJrHlsT9AIDPyRRZpwpkwITmMuDdizF7mmxjt7EH/woonqwvQvLH51zb8CE S//41+F0HhwTIHkVtGgLt+EJKm3NmoEiTknRUvTqfN6Hunf8KOjuIo5LVEk9Vhw4oiB/ hAJoVmozGmpxoiaJ6KxiCWmsHg/6XrwySymY58/vFgsaf8eiUb9odQ2aESABZKbA3e/B PK7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732292216; x=1732897016; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=aZyZMmV9iblLjdCcfE3CRt2M+y2t90QN+C7X66Z7Ik8=; b=RQz4M+rIsYy4phfTnHobyxdJzSZAynDeQo1PwU0B2kcdVcgWTCC2vkz9vvc27LikCf a3DxKSl4w2AmmuFq4W80ANo+PQ2VeP3ejDav/n4u8zooUcNyw0j93c1wMB3pszGJbW2x ru8CbIxJmlADdIvJ5iiZej68wY6szQFvdfH4clxtu0z/grZSnMf68yMUA8rQSdCPN2AJ JKuQm5DFQG7Ll0ViF0PCYjBWE2g/AAt5i1MnOo7F6zQDDsa1uJV2GFCWXQrjTcnIW3FN B+VvFT964EV0KFRgEbYx09dTMb1zXi8/42KYqI3limBlMaQIx5Sc5dLGKe4nQUKBIH+6 hm8Q== X-Gm-Message-State: AOJu0Yw50ItphcvzlNfZT7qnMAVM89ZFAWE+JqKuZVTyuxQSgGwT37uP GA9Tw2p9Vg2MhaNTPrM2QWXWt0b2GxqlWYVKiFdOCvupVPH32S7SlH0QZ9aKenzIiBkvfREsyAC 9u5M= X-Gm-Gg: ASbGncswfW1JSm7t70AdhDqAFTkr+YIy/TF1s48Hra1pcY8QGYBqyHMYY4p42/mBWxI NtywqtSHiFYZ7xZeo73mIgYgr6kAnK99ubzHNRn575wDy4cuQLTfKN3FfeVemw3RQBupteikmXW RHGDpISZfmjzf0UnJpYv/smZs/AauXvqHQjRfIiKGjLENnzwg3LO4VxkNdM7Y04A2BC5fRiwO30 ZWIzwdG+36Pifj0iUS2N9eFQwQ2LlPX/2fHftTvOp6x6ba3l0AxVg== X-Google-Smtp-Source: AGHT+IHfb/QVESXZsxfxYuZvkTUz7HN0XWWtLskg9UiBf5Pi3ZIV68U3amZ2CMhIEhkvAiNKTi4wBg== X-Received: by 2002:a05:6830:1e90:b0:718:d38:7bbe with SMTP id 46e09a7af769-71b0e598d3cmr4116649a34.11.1732292216260; Fri, 22 Nov 2024 08:16:56 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 006d021491bc7-5f06976585esm436958eaf.18.2024.11.22.08.16.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Nov 2024 08:16:55 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 5/6] io_uring: add __tctx_task_work_run() helper Date: Fri, 22 Nov 2024 09:12:43 -0700 Message-ID: <20241122161645.494868-6-axboe@kernel.dk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241122161645.494868-1-axboe@kernel.dk> References: <20241122161645.494868-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Most use cases only care about running all of the task_work, and they don't need the node passed back or the work capped. Rename the existing helper to __tctx_task_work_run(), and add a wrapper around that for the more basic use cases. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 18 ++++++++++++------ io_uring/io_uring.h | 9 +++------ io_uring/sqpoll.c | 2 +- 3 files changed, 16 insertions(+), 13 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 3bb93c77ac3f..bc520a67fc03 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1137,9 +1137,9 @@ static void io_fallback_tw(struct io_uring_task *tctx, bool sync) __io_fallback_tw(&tctx->task_list, &tctx->task_lock, sync); } -struct io_wq_work_node *tctx_task_work_run(struct io_uring_task *tctx, - unsigned int max_entries, - unsigned int *count) +struct io_wq_work_node *__tctx_task_work_run(struct io_uring_task *tctx, + unsigned int max_entries, + unsigned int *count) { struct io_wq_work_node *node; @@ -1167,14 +1167,20 @@ struct io_wq_work_node *tctx_task_work_run(struct io_uring_task *tctx, return node; } +unsigned int tctx_task_work_run(struct io_uring_task *tctx) +{ + unsigned int count = 0; + + __tctx_task_work_run(tctx, UINT_MAX, &count); + return count; +} + void tctx_task_work(struct callback_head *cb) { struct io_uring_task *tctx; - unsigned int count = 0; tctx = container_of(cb, struct io_uring_task, task_work); - if (tctx_task_work_run(tctx, UINT_MAX, &count)) - WARN_ON_ONCE(1); + tctx_task_work_run(tctx); } static inline void io_req_local_work_add(struct io_kiocb *req, diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 0b5181b128aa..2b0e7c5db30d 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -93,8 +93,9 @@ void io_req_task_queue_fail(struct io_kiocb *req, int ret); void io_req_task_submit(struct io_kiocb *req, struct io_tw_state *ts); struct io_wq_work_node *io_handle_tw_list(struct io_wq_work_node *node, unsigned int *count, unsigned int max_entries); -struct io_wq_work_node *tctx_task_work_run(struct io_uring_task *tctx, +struct io_wq_work_node *__tctx_task_work_run(struct io_uring_task *tctx, unsigned int max_entries, unsigned int *count); +unsigned int tctx_task_work_run(struct io_uring_task *tctx); void tctx_task_work(struct callback_head *cb); __cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd); int io_uring_alloc_task_context(struct task_struct *task, @@ -332,12 +333,8 @@ static inline int io_run_task_work(void) resume_user_mode_work(NULL); } if (current->io_uring) { - unsigned int count = 0; - __set_current_state(TASK_RUNNING); - tctx_task_work_run(current->io_uring, UINT_MAX, &count); - if (count) - ret = true; + ret = tctx_task_work_run(current->io_uring) != 0; } } if (task_work_pending(current)) { diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c index 615707260f25..aec6c2d56910 100644 --- a/io_uring/sqpoll.c +++ b/io_uring/sqpoll.c @@ -232,7 +232,7 @@ static unsigned int io_sq_tw(struct io_wq_work_node **retry_list, int max_entrie goto out; max_entries -= count; } - *retry_list = tctx_task_work_run(tctx, max_entries, &count); + *retry_list = __tctx_task_work_run(tctx, max_entries, &count); out: if (task_work_pending(current)) task_work_run(); From patchwork Fri Nov 22 16:12:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13883331 Received: from mail-oa1-f50.google.com (mail-oa1-f50.google.com [209.85.160.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2AB6F1DE2BE for ; Fri, 22 Nov 2024 16:16:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732292221; cv=none; b=faPftNViZi+W6VWjcshEu9R0jnPAZ/2gG1K/cqmzEmNJDvNcushWZn9+RzWvgNWf/55muYJudyXyIbL632PGD8gilirh7N0zgxoQ28O9ymB3zJbaxbnFWfWVzBDd394sADsSmY0eVbPuTOCHlfcpjwiuFp4K+2O+4Wtp+y1VMXQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732292221; c=relaxed/simple; bh=Sx7Z+8UdA2+RRGv0MaBPB+SqAcBDuC10NJpQzFkxep0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UOjuVis5bD93ObtWmtWZJmTHPzCyer1uSyFha1DEEIJcrwMuC6O9X9XIYotmiOFxcTepvHtva2ZQjy8r7ScmAINa7/osL0sWOdSDVEvyO4UjVaHpM561+Dw4zaF1qRZcofp1PGzd+vVu/h+G1HkMVbeSiAlYil/49OaaGm3KREQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=xvNdOuwo; arc=none smtp.client-ip=209.85.160.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="xvNdOuwo" Received: by mail-oa1-f50.google.com with SMTP id 586e51a60fabf-27b7a1480bdso1288280fac.2 for ; Fri, 22 Nov 2024 08:16:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1732292217; x=1732897017; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=psFPyPZEURJCFZnrPeS8wFL/Z8gpnok+wNuirh5Dqm8=; b=xvNdOuwoSFjEWh4TFxZqeI5f+LSlR2157OE6xPQL2b49+Ukc6rh8kDyF/QPYTWQYw5 AkKSRV4cci6UiVA3n47rme7FFeFsbi+at99leHMJ8s1VKrCI2hGBD2e/Nw6JERIK+mo/ DeRNK/rMQ77WVR2/bbcA1st65X3k/ctkg5ilQNMjL5vgCfb+PBd2V5lwMl8wxSHz/G1+ Gq1Gte2DTynpv3H+vwwF9er3/JkgfpOuUOv+avcPehzHXiSm7dCY6Tt988tVRStwS37K gZLJA9W2kksZNT6oob+WHQyU5p+hkyEeq4JSc9LdKXdW3OhK3SyfqQpQ9k+dnr0xSiTl 4RTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732292217; x=1732897017; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=psFPyPZEURJCFZnrPeS8wFL/Z8gpnok+wNuirh5Dqm8=; b=bxqkyh2+e7YxGdu1KzF20T0r4i/nHh+9mCmTHD1NZe2D2QY3abKsr/N8m0mEcMcHa3 dDZ/RsRjd7tn21Ov3okDcxDitDUoY1LsNjiMcuTsKcguchDNqERFm157un07e6NfCG55 75ARHWrteD/lomxZkqc46EQ0LPkDY21vUsIz0MVUHthII8hwq0MmRvcRt5ou0UmdvVem Sd2bCXZKpWEw9qbkgNV6E2YfMmT8+rbS9Ijh/ZeKm0t7z47iLpuyZTST6XFMF+cIzMR8 U1N0IrJxVNLRie0z7zN0zYA3wiieNdb+AfXV9jun352HB/DGYj2LLYCPufTDB9ieIZ3o ICGQ== X-Gm-Message-State: AOJu0YyB+AGHGyLmz9WnlPWXMUyEhQIm9fv8BYUNpWhVNNMUmN3kF6Vv OngOlE00fH3epEcAKws4yBdSUzqi8kUhOHMNWrdHRxsRxGz9fYfYbo9u7ypMXkMn/IfP+WWTP4H p1ZM= X-Gm-Gg: ASbGnctFai9cYW9+37tzKIU1qq0b+RhAjWP5IpwCRstbEX/DdUPaGkWsoqI1vVfXdyK tePeKbKLSPsR8ltTzpcXZSG1pWM+vidb7WHTmYVsx9RRrPqgBBhH7Rznwkh7z9OULbIeMGkACIs z0j+SnBEyYqChXFCwevdd605jeKTq6eaQx8d+mdMW1yMa5TR9/1LrQv1h346+CaQj5e+Ax8VhuY D0kYJb0hEc2Lbl42XX22vMACCUOdsYP3dv9EcHZies5EwGU7cReLA== X-Google-Smtp-Source: AGHT+IHyvujbI0j7KjJUO5J9PTZWB89UCenjCzkqOjB7ByEW8xtrUXxxhaKAX89983rwJh2g1sa+Sw== X-Received: by 2002:a05:6870:d69c:b0:296:de15:f27d with SMTP id 586e51a60fabf-29720e08c8bmr4174309fac.30.1732292217656; Fri, 22 Nov 2024 08:16:57 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 006d021491bc7-5f06976585esm436958eaf.18.2024.11.22.08.16.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Nov 2024 08:16:56 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 6/6] io_uring: make __tctx_task_work_run() take an io_wq_work_list Date: Fri, 22 Nov 2024 09:12:44 -0700 Message-ID: <20241122161645.494868-7-axboe@kernel.dk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241122161645.494868-1-axboe@kernel.dk> References: <20241122161645.494868-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The normal task_work logic doesn't really need it, as it always runs all of the pending work. But for SQPOLL, it can now pass in its retry_list which simplifies the tracking of split up task_work running. This avoids passing io_wq_work_node around. Rather than pass in a list, SQPOLL could re-add the leftover items to the generic task_work list. But that requires re-locking the task_lock and using task_list for that, whereas having a separate retry list allows for skipping those steps. The downside is that now two lists need checking, but that's now it was before as well. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 36 ++++++++++++++++-------------------- io_uring/io_uring.h | 9 +++++---- io_uring/sqpoll.c | 20 +++++++++++--------- 3 files changed, 32 insertions(+), 33 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index bc520a67fc03..5e52d8db3dca 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1044,20 +1044,20 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, struct io_tw_state *ts) /* * Run queued task_work, returning the number of entries processed in *count. * If more entries than max_entries are available, stop processing once this - * is reached and return the rest of the list. + * is reached. */ -struct io_wq_work_node *io_handle_tw_list(struct io_wq_work_node *node, - unsigned int *count, - unsigned int max_entries) +void io_handle_tw_list(struct io_wq_work_list *list, unsigned int *count, + unsigned int max_entries) { struct io_ring_ctx *ctx = NULL; struct io_tw_state ts = { }; do { - struct io_wq_work_node *next = node->next; + struct io_wq_work_node *node = list->first; struct io_kiocb *req = container_of(node, struct io_kiocb, io_task_work.node); + list->first = node->next; if (req->ctx != ctx) { ctx_flush_and_put(ctx, &ts); ctx = req->ctx; @@ -1067,17 +1067,15 @@ struct io_wq_work_node *io_handle_tw_list(struct io_wq_work_node *node, INDIRECT_CALL_2(req->io_task_work.func, io_poll_task_func, io_req_rw_complete, req, &ts); - node = next; (*count)++; if (unlikely(need_resched())) { ctx_flush_and_put(ctx, &ts); ctx = NULL; cond_resched(); } - } while (node && *count < max_entries); + } while (list->first && *count < max_entries); ctx_flush_and_put(ctx, &ts); - return node; } static __cold void __io_fallback_schedule(struct io_ring_ctx *ctx, @@ -1137,41 +1135,39 @@ static void io_fallback_tw(struct io_uring_task *tctx, bool sync) __io_fallback_tw(&tctx->task_list, &tctx->task_lock, sync); } -struct io_wq_work_node *__tctx_task_work_run(struct io_uring_task *tctx, - unsigned int max_entries, - unsigned int *count) +void __tctx_task_work_run(struct io_uring_task *tctx, + struct io_wq_work_list *list, + unsigned int max_entries, unsigned int *count) { - struct io_wq_work_node *node; - if (unlikely(current->flags & PF_EXITING)) { io_fallback_tw(tctx, true); - return NULL; + return; } if (!READ_ONCE(tctx->task_list.first)) - return NULL; + return; spin_lock_irq(&tctx->task_lock); - node = tctx->task_list.first; + *list = tctx->task_list; INIT_WQ_LIST(&tctx->task_list); spin_unlock_irq(&tctx->task_lock); - if (node) - node = io_handle_tw_list(node, count, max_entries); + if (!wq_list_empty(list)) + io_handle_tw_list(list, count, max_entries); /* relaxed read is enough as only the task itself sets ->in_cancel */ if (unlikely(atomic_read(&tctx->in_cancel))) io_uring_drop_tctx_refs(current); trace_io_uring_task_work_run(tctx, *count); - return node; } unsigned int tctx_task_work_run(struct io_uring_task *tctx) { + struct io_wq_work_list list; unsigned int count = 0; - __tctx_task_work_run(tctx, UINT_MAX, &count); + __tctx_task_work_run(tctx, &list, UINT_MAX, &count); return count; } diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 2b0e7c5db30d..74b1468aefda 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -91,10 +91,11 @@ void io_req_task_queue(struct io_kiocb *req); void io_req_task_complete(struct io_kiocb *req, struct io_tw_state *ts); void io_req_task_queue_fail(struct io_kiocb *req, int ret); void io_req_task_submit(struct io_kiocb *req, struct io_tw_state *ts); -struct io_wq_work_node *io_handle_tw_list(struct io_wq_work_node *node, - unsigned int *count, unsigned int max_entries); -struct io_wq_work_node *__tctx_task_work_run(struct io_uring_task *tctx, - unsigned int max_entries, unsigned int *count); +void io_handle_tw_list(struct io_wq_work_list *list, unsigned int *count, + unsigned int max_entries); +void __tctx_task_work_run(struct io_uring_task *tctx, + struct io_wq_work_list *list, unsigned int max_entries, + unsigned int *count); unsigned int tctx_task_work_run(struct io_uring_task *tctx); void tctx_task_work(struct callback_head *cb); __cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd); diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c index aec6c2d56910..3cd50369db5a 100644 --- a/io_uring/sqpoll.c +++ b/io_uring/sqpoll.c @@ -221,29 +221,29 @@ static bool io_sqd_handle_event(struct io_sq_data *sqd) * than we were asked to process. Newly queued task_work isn't run until the * retry list has been fully processed. */ -static unsigned int io_sq_tw(struct io_wq_work_node **retry_list, int max_entries) +static unsigned int io_sq_tw(struct io_wq_work_list *retry_list, int max_entries) { struct io_uring_task *tctx = current->io_uring; unsigned int count = 0; - if (*retry_list) { - *retry_list = io_handle_tw_list(*retry_list, &count, max_entries); + if (!wq_list_empty(retry_list)) { + io_handle_tw_list(retry_list, &count, max_entries); if (count >= max_entries) goto out; max_entries -= count; } - *retry_list = __tctx_task_work_run(tctx, max_entries, &count); + __tctx_task_work_run(tctx, retry_list, max_entries, &count); out: if (task_work_pending(current)) task_work_run(); return count; } -static bool io_sq_tw_pending(struct io_wq_work_node *retry_list) +static bool io_sq_tw_pending(struct io_wq_work_list *retry_list) { struct io_uring_task *tctx = current->io_uring; - return retry_list || READ_ONCE(tctx->task_list.first); + return !wq_list_empty(retry_list) || !wq_list_empty(&tctx->task_list); } static void io_sq_update_worktime(struct io_sq_data *sqd, struct rusage *start) @@ -259,7 +259,7 @@ static void io_sq_update_worktime(struct io_sq_data *sqd, struct rusage *start) static int io_sq_thread(void *data) { - struct io_wq_work_node *retry_list = NULL; + struct io_wq_work_list retry_list; struct io_sq_data *sqd = data; struct io_ring_ctx *ctx; struct rusage start; @@ -292,6 +292,7 @@ static int io_sq_thread(void *data) audit_uring_entry(IORING_OP_NOP); audit_uring_exit(true, 0); + INIT_WQ_LIST(&retry_list); mutex_lock(&sqd->lock); while (1) { bool cap_entries, sqt_spin = false; @@ -332,7 +333,8 @@ static int io_sq_thread(void *data) } prepare_to_wait(&sqd->wait, &wait, TASK_INTERRUPTIBLE); - if (!io_sqd_events_pending(sqd) && !io_sq_tw_pending(retry_list)) { + if (!io_sqd_events_pending(sqd) && + !io_sq_tw_pending(&retry_list)) { bool needs_sched = true; list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) { @@ -371,7 +373,7 @@ static int io_sq_thread(void *data) timeout = jiffies + sqd->sq_thread_idle; } - if (retry_list) + if (!wq_list_empty(&retry_list)) io_sq_tw(&retry_list, UINT_MAX); io_uring_cancel_generic(true, sqd);