From patchwork Tue Aug 30 12:50:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dylan Yudaken X-Patchwork-Id: 12959291 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F357DECAAA1 for ; Tue, 30 Aug 2022 12:50:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229983AbiH3Mup (ORCPT ); Tue, 30 Aug 2022 08:50:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229954AbiH3Muo (ORCPT ); Tue, 30 Aug 2022 08:50:44 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E92A1B07D7 for ; Tue, 30 Aug 2022 05:50:43 -0700 (PDT) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27U7a6aP008996 for ; Tue, 30 Aug 2022 05:50:43 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=OQHEKzKyP7ZsHmnHUwkSBWz9zshEZmyZlwADqUvl80I=; b=F4UWWKG1TsQZve/2QrQ/NYpNsukSBK5JcQL7LV9s539Yvo/uZa/LMaEPVB0s4wJhm83K Uqfk62slLv4oOMLBFmPQMzlJxqMocnh+Rp+LCYm/7Dm+xB1SoZlQNnoQ2Pcg3GN3KL+B UVrxzeFFbmV7Ra3l/Y7eNVs0cgx5U3AUyCc= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j9e9ysbge-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 30 Aug 2022 05:50:43 -0700 Received: from twshared0646.06.ash9.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c085:11d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 30 Aug 2022 05:50:41 -0700 Received: by devbig038.lla2.facebook.com (Postfix, from userid 572232) id DEC8F55BF50A; Tue, 30 Aug 2022 05:50:30 -0700 (PDT) From: Dylan Yudaken To: Jens Axboe , Pavel Begunkov , CC: , Dylan Yudaken Subject: [PATCH for-next v4 1/7] io_uring: remove unnecessary variable Date: Tue, 30 Aug 2022 05:50:07 -0700 Message-ID: <20220830125013.570060-2-dylany@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220830125013.570060-1-dylany@fb.com> References: <20220830125013.570060-1-dylany@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: gvOuY4TegKqxxoNPMEmGWGd6Pliy5SjV X-Proofpoint-GUID: gvOuY4TegKqxxoNPMEmGWGd6Pliy5SjV X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-30_07,2022-08-30_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org 'running' is set once and read once, so can easily just remove it Signed-off-by: Dylan Yudaken --- io_uring/io_uring.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 77616279000b..41eaf5ec70df 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1052,12 +1052,9 @@ void io_req_task_work_add(struct io_kiocb *req) struct io_uring_task *tctx = req->task->io_uring; struct io_ring_ctx *ctx = req->ctx; struct llist_node *node; - bool running; - - running = !llist_add(&req->io_task_work.node, &tctx->task_list); /* task_work already pending, we're done */ - if (running) + if (!llist_add(&req->io_task_work.node, &tctx->task_list)) return; if (ctx->flags & IORING_SETUP_TASKRUN_FLAG) From patchwork Tue Aug 30 12:50:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dylan Yudaken X-Patchwork-Id: 12959292 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A674C0502A for ; Tue, 30 Aug 2022 12:50:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229697AbiH3Mur (ORCPT ); Tue, 30 Aug 2022 08:50:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43392 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229982AbiH3Mup (ORCPT ); Tue, 30 Aug 2022 08:50:45 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11BA1B1B83 for ; Tue, 30 Aug 2022 05:50:44 -0700 (PDT) Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27UApFDt031298 for ; Tue, 30 Aug 2022 05:50:43 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=lQfLjUA9Wxmh/bZU7j4++b28yG85+qr8HmcaqcrdUWQ=; b=Pmx1nWnTKZAhrFy2BrOzfCRYXqQTyEFiaNtQ71d/kTQuTUy/vXYOaLccIvlCAsF5W2NE EIBlc6oS83MVeYtdYZyV142dxePLMul1XbOMdBStkLOJ4pqu5Hyc4WzhVuECd2zKBnI3 yucn6E6yLmgnAGo2lMEJpdrLyR33pwX2xT4= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j9h5dghxm-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 30 Aug 2022 05:50:43 -0700 Received: from twshared0646.06.ash9.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:11d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 30 Aug 2022 05:50:41 -0700 Received: by devbig038.lla2.facebook.com (Postfix, from userid 572232) id E369155BF50C; Tue, 30 Aug 2022 05:50:30 -0700 (PDT) From: Dylan Yudaken To: Jens Axboe , Pavel Begunkov , CC: , Dylan Yudaken Subject: [PATCH for-next v4 2/7] io_uring: introduce io_has_work Date: Tue, 30 Aug 2022 05:50:08 -0700 Message-ID: <20220830125013.570060-3-dylany@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220830125013.570060-1-dylany@fb.com> References: <20220830125013.570060-1-dylany@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: WwGLPEx3hEUYwL0plp2O1ESJeJvbCsnd X-Proofpoint-ORIG-GUID: WwGLPEx3hEUYwL0plp2O1ESJeJvbCsnd X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-30_07,2022-08-30_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org This will be used later to know if the ring has outstanding work. Right now just if there is overflow CQEs to copy to the main CQE ring, but later will include deferred tasks Signed-off-by: Dylan Yudaken --- io_uring/io_uring.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 41eaf5ec70df..7998dc23360f 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2145,6 +2145,11 @@ struct io_wait_queue { unsigned nr_timeouts; }; +static inline bool io_has_work(struct io_ring_ctx *ctx) +{ + return test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq); +} + static inline bool io_should_wake(struct io_wait_queue *iowq) { struct io_ring_ctx *ctx = iowq->ctx; @@ -2163,13 +2168,13 @@ static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode, { struct io_wait_queue *iowq = container_of(curr, struct io_wait_queue, wq); + struct io_ring_ctx *ctx = iowq->ctx; /* * Cannot safely flush overflowed CQEs from here, ensure we wake up * the task, and the next invocation will do it. */ - if (io_should_wake(iowq) || - test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &iowq->ctx->check_cq)) + if (io_should_wake(iowq) || io_has_work(ctx)) return autoremove_wake_function(curr, mode, wake_flags, key); return -1; } @@ -2505,8 +2510,8 @@ static __poll_t io_uring_poll(struct file *file, poll_table *wait) * Users may get EPOLLIN meanwhile seeing nothing in cqring, this * pushs them to do the flush. */ - if (io_cqring_events(ctx) || - test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq)) + + if (io_cqring_events(ctx) || io_has_work(ctx)) mask |= EPOLLIN | EPOLLRDNORM; return mask; From patchwork Tue Aug 30 12:50:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dylan Yudaken X-Patchwork-Id: 12959289 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BDB65ECAAA1 for ; Tue, 30 Aug 2022 12:50:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229947AbiH3Muh (ORCPT ); Tue, 30 Aug 2022 08:50:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42964 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229976AbiH3Mug (ORCPT ); Tue, 30 Aug 2022 08:50:36 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A26B2A3D39 for ; Tue, 30 Aug 2022 05:50:35 -0700 (PDT) Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27U7a7Mi009001 for ; Tue, 30 Aug 2022 05:50:34 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=G2A4kzMWmMHwVR1gIxGsEfytt8rlUmw0cK1988HKQqU=; b=mLrjKsjENqHo6PG9QVEXeJfUy6T8C3APDVyvBscOBYybcs5PDgLfhdM8YEt0PkJiWNFe 9a61QhBf85H/h4H9VPOlqUX6CjZvydVR9jHx6vNcFVUPyqHli5nfcWd2gg08hAWi4hDR X2f6wOA8GlU56+h3Fe8zm8Ba6eIcEbPhin4= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j9e9ysbg1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 30 Aug 2022 05:50:34 -0700 Received: from twshared8288.05.ash9.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:82::c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 30 Aug 2022 05:50:33 -0700 Received: by devbig038.lla2.facebook.com (Postfix, from userid 572232) id EA5C655BF50E; Tue, 30 Aug 2022 05:50:30 -0700 (PDT) From: Dylan Yudaken To: Jens Axboe , Pavel Begunkov , CC: , Dylan Yudaken Subject: [PATCH for-next v4 3/7] io_uring: do not run task work at the start of io_uring_enter Date: Tue, 30 Aug 2022 05:50:09 -0700 Message-ID: <20220830125013.570060-4-dylany@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220830125013.570060-1-dylany@fb.com> References: <20220830125013.570060-1-dylany@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: 794CRfYyXisXU6acie3XT3OWcnqCeKPl X-Proofpoint-GUID: 794CRfYyXisXU6acie3XT3OWcnqCeKPl X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-30_07,2022-08-30_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org This is not needed, and it is normally better to wait for task work until after submissions. This will allow greater batching if either work arrives in the meanwhile, or if the submissions cause task work to be queued up. For SQPOLL this also no longer runs task work, but this is handled inside the SQPOLL loop anyway. For IOPOLL io_iopoll_check will run task work anyway And otherwise io_cqring_wait will run task work Suggested-by: Pavel Begunkov Signed-off-by: Dylan Yudaken --- io_uring/io_uring.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 7998dc23360f..329d5b9d448e 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2991,8 +2991,6 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, struct fd f; long ret; - io_run_task_work(); - if (unlikely(flags & ~(IORING_ENTER_GETEVENTS | IORING_ENTER_SQ_WAKEUP | IORING_ENTER_SQ_WAIT | IORING_ENTER_EXT_ARG | IORING_ENTER_REGISTERED_RING))) From patchwork Tue Aug 30 12:50:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dylan Yudaken X-Patchwork-Id: 12959293 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1037EECAAD4 for ; Tue, 30 Aug 2022 12:50:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229978AbiH3Mur (ORCPT ); Tue, 30 Aug 2022 08:50:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43446 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229954AbiH3Muq (ORCPT ); Tue, 30 Aug 2022 08:50:46 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 89BCDAE9C8 for ; Tue, 30 Aug 2022 05:50:44 -0700 (PDT) Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 27TMpNVG015625 for ; Tue, 30 Aug 2022 05:50:43 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-type : content-transfer-encoding : mime-version; s=facebook; bh=H4TVGHqw1eGMuepaH71uPv++6POI1G5nbkjTU/AOufs=; b=VrU+nj0ieZHr4X3Ib9vU0ssSPzZ4yYH44956fWe6WbnvuW1/zOSsJnl/u2wuFhlhS/Nj 8tMpr1X98spppr9bzMarxMu6vdodm3wrLa8OUf0c+YN6p/wLatjokew+E2/CRqSN8bfX fE98gmo53CP+HQ4GX7aE3bDnOf12aQtIX7U= Received: from mail.thefacebook.com ([163.114.132.120]) by m0001303.ppops.net (PPS) with ESMTPS id 3j7exygh8c-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 30 Aug 2022 05:50:43 -0700 Received: from twshared0646.06.ash9.facebook.com (2620:10d:c085:108::8) by mail.thefacebook.com (2620:10d:c085:21d::4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 30 Aug 2022 05:50:41 -0700 Received: by devbig038.lla2.facebook.com (Postfix, from userid 572232) id F088855BF510; Tue, 30 Aug 2022 05:50:30 -0700 (PDT) From: Dylan Yudaken To: Jens Axboe , Pavel Begunkov , CC: , Dylan Yudaken Subject: [PATCH for-next v4 4/7] io_uring: add IORING_SETUP_DEFER_TASKRUN Date: Tue, 30 Aug 2022 05:50:10 -0700 Message-ID: <20220830125013.570060-5-dylany@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220830125013.570060-1-dylany@fb.com> References: <20220830125013.570060-1-dylany@fb.com> X-FB-Internal: Safe X-Proofpoint-GUID: 06EyFhZoBDawiFZM-tqzTj22mzUeH9sx X-Proofpoint-ORIG-GUID: 06EyFhZoBDawiFZM-tqzTj22mzUeH9sx X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-30_07,2022-08-30_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Allow deferring async tasks until the user calls io_uring_enter(2) with the IORING_ENTER_GETEVENTS flag. Enable this mode with a flag at io_uring_setup time. This functionality requires that the later io_uring_enter will be called from the same submission task, and therefore restrict this flag to work only when IORING_SETUP_SINGLE_ISSUER is also set. Being able to hand pick when tasks are run prevents the problem where there is current work to be done, however task work runs anyway. For example, a common workload would obtain a batch of CQEs, and process each one. Interrupting this to additional taskwork would add latency but not gain anything. If instead task work is deferred to just before more CQEs are obtained then no additional latency is added. The way this is implemented is by trying to keep task work local to a io_ring_ctx, rather than to the submission task. This is required, as the application will want to wake up only a single io_ring_ctx at a time to process work, and so the lists of work have to be kept separate. This has some other benefits like not having to check the task continually in handle_tw_list (and potentially unlocking/locking those), and reducing locks in the submit & process completions path. There are networking cases where using this option can reduce request latency by 50%. For example a contrived example using [1] where the client sends 2k data and receives the same data back while doing some system calls (to trigger task work) shows this reduction. The reason ends up being that if sending responses is delayed by processing task work, then the client side sits idle. Whereas reordering the sends first means that the client runs it's workload in parallel with the local task work. [1]: Using https://github.com/DylanZA/netbench/tree/defer_run Client: ./netbench --client_only 1 --control_port 10000 --host --tx "epoll --threads 16 --per_thread 1 --size 2048 --resp 2048 --workload 1000" Server: ./netbench --server_only 1 --control_port 10000 --rx "io_uring --defer_taskrun 0 --workload 100" --rx "io_uring --defer_taskrun 1 --workload 100" Signed-off-by: Dylan Yudaken --- include/linux/io_uring_types.h | 2 + include/uapi/linux/io_uring.h | 7 ++ io_uring/cancel.c | 2 +- io_uring/io_uring.c | 147 +++++++++++++++++++++++++++++---- io_uring/io_uring.h | 29 ++++++- io_uring/rsrc.c | 2 +- 6 files changed, 168 insertions(+), 21 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 677a25d44d7f..d56ff2185168 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -301,6 +301,8 @@ struct io_ring_ctx { struct io_hash_table cancel_table; bool poll_multi_queue; + struct llist_head work_llist; + struct list_head io_buffers_comp; } ____cacheline_aligned_in_smp; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 9e0b5c8d92ce..48e5c70e0baf 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -157,6 +157,13 @@ enum { */ #define IORING_SETUP_SINGLE_ISSUER (1U << 12) +/* + * Defer running task work to get events. + * Rather than running bits of task work whenever the task transitions + * try to do it just before it is needed. + */ +#define IORING_SETUP_DEFER_TASKRUN (1U << 13) + enum io_uring_op { IORING_OP_NOP, IORING_OP_READV, diff --git a/io_uring/cancel.c b/io_uring/cancel.c index 5fc5d3e80fcb..2291a53cdabd 100644 --- a/io_uring/cancel.c +++ b/io_uring/cancel.c @@ -292,7 +292,7 @@ int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg) break; mutex_unlock(&ctx->uring_lock); - ret = io_run_task_work_sig(); + ret = io_run_task_work_sig(ctx); if (ret < 0) { mutex_lock(&ctx->uring_lock); break; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 329d5b9d448e..9f90b0633de7 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -142,7 +142,7 @@ static bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx, static void io_dismantle_req(struct io_kiocb *req); static void io_clean_op(struct io_kiocb *req); static void io_queue_sqe(struct io_kiocb *req); - +static void io_move_task_work_from_local(struct io_ring_ctx *ctx); static void __io_submit_flush_completions(struct io_ring_ctx *ctx); static struct kmem_cache *req_cachep; @@ -316,6 +316,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) INIT_LIST_HEAD(&ctx->rsrc_ref_list); INIT_DELAYED_WORK(&ctx->rsrc_put_work, io_rsrc_put_work); init_llist_head(&ctx->rsrc_put_llist); + init_llist_head(&ctx->work_llist); INIT_LIST_HEAD(&ctx->tctx_list); ctx->submit_state.free_list.next = NULL; INIT_WQ_LIST(&ctx->locked_free_list); @@ -1047,12 +1048,36 @@ void tctx_task_work(struct callback_head *cb) trace_io_uring_task_work_run(tctx, count, loops); } -void io_req_task_work_add(struct io_kiocb *req) +static void io_req_local_work_add(struct io_kiocb *req) +{ + struct io_ring_ctx *ctx = req->ctx; + + if (!llist_add(&req->io_task_work.node, &ctx->work_llist)) + return; + + if (unlikely(atomic_read(&req->task->io_uring->in_idle))) { + io_move_task_work_from_local(ctx); + return; + } + + if (ctx->flags & IORING_SETUP_TASKRUN_FLAG) + atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags); + + io_cqring_wake(ctx); + +} + +static inline void __io_req_task_work_add(struct io_kiocb *req, bool allow_local) { struct io_uring_task *tctx = req->task->io_uring; struct io_ring_ctx *ctx = req->ctx; struct llist_node *node; + if (allow_local && ctx->flags & IORING_SETUP_DEFER_TASKRUN) { + io_req_local_work_add(req); + return; + } + /* task_work already pending, we're done */ if (!llist_add(&req->io_task_work.node, &tctx->task_list)) return; @@ -1074,6 +1099,73 @@ void io_req_task_work_add(struct io_kiocb *req) } } +void io_req_task_work_add(struct io_kiocb *req) +{ + __io_req_task_work_add(req, true); +} + +static void __cold io_move_task_work_from_local(struct io_ring_ctx *ctx) +{ + struct llist_node *node; + + node = llist_del_all(&ctx->work_llist); + while (node) { + struct io_kiocb *req = container_of(node, struct io_kiocb, + io_task_work.node); + + node = node->next; + __io_req_task_work_add(req, false); + } +} + +int io_run_local_work(struct io_ring_ctx *ctx) +{ + bool locked; + struct llist_node *node; + struct llist_node fake; + struct llist_node *current_final = NULL; + int ret; + + if (unlikely(ctx->submitter_task != current)) { + /* maybe this is before any submissions */ + if (!ctx->submitter_task) + return 0; + + return -EEXIST; + } + + locked = mutex_trylock(&ctx->uring_lock); + + node = io_llist_xchg(&ctx->work_llist, &fake); + ret = 0; +again: + while (node != current_final) { + struct llist_node *next = node->next; + struct io_kiocb *req = container_of(node, struct io_kiocb, + io_task_work.node); + prefetch(container_of(next, struct io_kiocb, io_task_work.node)); + req->io_task_work.func(req, &locked); + ret++; + node = next; + } + + if (ctx->flags & IORING_SETUP_TASKRUN_FLAG) + atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags); + + node = io_llist_cmpxchg(&ctx->work_llist, &fake, NULL); + if (node != &fake) { + current_final = &fake; + node = io_llist_xchg(&ctx->work_llist, &fake); + goto again; + } + + if (locked) { + io_submit_flush_completions(ctx); + mutex_unlock(&ctx->uring_lock); + } + return ret; +} + static void io_req_tw_post(struct io_kiocb *req, bool *locked) { io_req_complete_post(req); @@ -1285,8 +1377,10 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min) u32 tail = ctx->cached_cq_tail; mutex_unlock(&ctx->uring_lock); - io_run_task_work(); + ret = io_run_task_work_ctx(ctx); mutex_lock(&ctx->uring_lock); + if (ret < 0) + break; /* some requests don't go through iopoll_list */ if (tail != ctx->cached_cq_tail || @@ -2147,7 +2241,9 @@ struct io_wait_queue { static inline bool io_has_work(struct io_ring_ctx *ctx) { - return test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq); + return test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq) || + ((ctx->flags & IORING_SETUP_DEFER_TASKRUN) && + !llist_empty(&ctx->work_llist)); } static inline bool io_should_wake(struct io_wait_queue *iowq) @@ -2179,9 +2275,9 @@ static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode, return -1; } -int io_run_task_work_sig(void) +int io_run_task_work_sig(struct io_ring_ctx *ctx) { - if (io_run_task_work()) + if (io_run_task_work_ctx(ctx) > 0) return 1; if (task_sigpending(current)) return -EINTR; @@ -2197,7 +2293,7 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, unsigned long check_cq; /* make sure we run task_work before checking for signals */ - ret = io_run_task_work_sig(); + ret = io_run_task_work_sig(ctx); if (ret || io_should_wake(iowq)) return ret; @@ -2228,12 +2324,14 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, int ret; do { + /* always run at least 1 task work to process local work */ + ret = io_run_task_work_ctx(ctx); + if (ret < 0) + return ret; io_cqring_overflow_flush(ctx); if (io_cqring_events(ctx) >= min_events) return 0; - if (!io_run_task_work()) - break; - } while (1); + } while (ret > 0); if (sig) { #ifdef CONFIG_COMPAT @@ -2574,6 +2672,9 @@ static __cold void io_ring_exit_work(struct work_struct *work) * as nobody else will be looking for them. */ do { + if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) + io_move_task_work_from_local(ctx); + while (io_uring_try_cancel_requests(ctx, NULL, true)) cond_resched(); @@ -2769,13 +2870,15 @@ static __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx, } } + if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) + ret |= io_run_local_work(ctx) > 0; ret |= io_cancel_defer_files(ctx, task, cancel_all); mutex_lock(&ctx->uring_lock); ret |= io_poll_remove_all(ctx, task, cancel_all); mutex_unlock(&ctx->uring_lock); ret |= io_kill_timeouts(ctx, task, cancel_all); if (task) - ret |= io_run_task_work(); + ret |= io_run_task_work() > 0; return ret; } @@ -3060,8 +3163,10 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, goto iopoll_locked; mutex_unlock(&ctx->uring_lock); } + if (flags & IORING_ENTER_GETEVENTS) { int ret2; + if (ctx->syscall_iopoll) { /* * We disallow the app entering submit/complete with @@ -3290,17 +3395,29 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, if (ctx->flags & IORING_SETUP_SQPOLL) { /* IPI related flags don't make sense with SQPOLL */ if (ctx->flags & (IORING_SETUP_COOP_TASKRUN | - IORING_SETUP_TASKRUN_FLAG)) + IORING_SETUP_TASKRUN_FLAG | + IORING_SETUP_DEFER_TASKRUN)) goto err; ctx->notify_method = TWA_SIGNAL_NO_IPI; } else if (ctx->flags & IORING_SETUP_COOP_TASKRUN) { ctx->notify_method = TWA_SIGNAL_NO_IPI; } else { - if (ctx->flags & IORING_SETUP_TASKRUN_FLAG) + if (ctx->flags & IORING_SETUP_TASKRUN_FLAG && + !(ctx->flags & IORING_SETUP_DEFER_TASKRUN)) goto err; ctx->notify_method = TWA_SIGNAL; } + /* + * For DEFER_TASKRUN we require the completion task to be the same as the + * submission task. This implies that there is only one submitter, so enforce + * that. + */ + if (ctx->flags & IORING_SETUP_DEFER_TASKRUN && + !(ctx->flags & IORING_SETUP_SINGLE_ISSUER)) { + goto err; + } + /* * This is just grabbed for accounting purposes. When a process exits, * the mm is exited and dropped before the files, hence we need to hang @@ -3401,7 +3518,7 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params) IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL | IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG | IORING_SETUP_SQE128 | IORING_SETUP_CQE32 | - IORING_SETUP_SINGLE_ISSUER)) + IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN)) return -EINVAL; return io_uring_create(entries, &p, params); @@ -3873,7 +3990,7 @@ SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode, ctx = f.file->private_data; - io_run_task_work(); + io_run_task_work_ctx(ctx); mutex_lock(&ctx->uring_lock); ret = __io_uring_register(ctx, opcode, arg, nr_args); diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 2f73f83af960..f417d75d7bc1 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -26,7 +26,8 @@ enum { struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx); bool io_req_cqe_overflow(struct io_kiocb *req); -int io_run_task_work_sig(void); +int io_run_task_work_sig(struct io_ring_ctx *ctx); +int io_run_local_work(struct io_ring_ctx *ctx); void io_req_complete_failed(struct io_kiocb *req, s32 res); void __io_req_complete(struct io_kiocb *req, unsigned issue_flags); void io_req_complete_post(struct io_kiocb *req); @@ -221,17 +222,37 @@ static inline unsigned int io_sqring_entries(struct io_ring_ctx *ctx) return smp_load_acquire(&rings->sq.tail) - ctx->cached_sq_head; } -static inline bool io_run_task_work(void) +static inline int io_run_task_work(void) { if (test_thread_flag(TIF_NOTIFY_SIGNAL)) { __set_current_state(TASK_RUNNING); clear_notify_signal(); if (task_work_pending(current)) task_work_run(); - return true; + return 1; } - return false; + return 0; +} + +static inline int io_run_task_work_ctx(struct io_ring_ctx *ctx) +{ + int ret = 0; + int ret2; + + if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) + ret = io_run_local_work(ctx); + + /* want to run this after in case more is added */ + ret2 = io_run_task_work(); + + /* Try propagate error in favour of if tasks were run, + * but still make sure to run them if requested + */ + if (ret >= 0) + ret += ret2; + + return ret; } static inline void io_tw_lock(struct io_ring_ctx *ctx, bool *locked) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 71359a4d0bd4..80cda6e2067f 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -343,7 +343,7 @@ __cold static int io_rsrc_ref_quiesce(struct io_rsrc_data *data, flush_delayed_work(&ctx->rsrc_put_work); reinit_completion(&data->done); - ret = io_run_task_work_sig(); + ret = io_run_task_work_sig(ctx); mutex_lock(&ctx->uring_lock); } while (ret >= 0); data->quiesce = false; From patchwork Tue Aug 30 12:50:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dylan Yudaken X-Patchwork-Id: 12959294 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8476C0502C for ; Tue, 30 Aug 2022 12:50:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229954AbiH3Mus (ORCPT ); Tue, 30 Aug 2022 08:50:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43446 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229982AbiH3Mur (ORCPT ); Tue, 30 Aug 2022 08:50:47 -0400 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B224A6C2E for ; Tue, 30 Aug 2022 05:50:46 -0700 (PDT) Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.17.1.5/8.17.1.5) with ESMTP id 27UBRurc009732 for ; Tue, 30 Aug 2022 05:50:45 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=PiB4E1TLr4n+pCjSR+mLg6tqXq3iq6mkgDPp7MVyPVE=; b=E+r1sFZAZrQ57+i/1h4uTSMQmcvEdCHXKotu85KgOS/8P5ZySzhLfNAZuaXai9Fq1v7r vVDUhc4taKY7WG6vqgC6jvKUK9NVyO2lBqO8axKL5jvbba2H4Zs+BT1KY2E+ckYEXefL +RfG9KnbROnrFoakoOjZgLBM4GBSTE9WSMg= Received: from mail.thefacebook.com ([163.114.132.120]) by m0089730.ppops.net (PPS) with ESMTPS id 3j9hpwgch9-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 30 Aug 2022 05:50:45 -0700 Received: from twshared0646.06.ash9.facebook.com (2620:10d:c085:208::11) by mail.thefacebook.com (2620:10d:c085:11d::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 30 Aug 2022 05:50:41 -0700 Received: by devbig038.lla2.facebook.com (Postfix, from userid 572232) id 0A21B55BF513; Tue, 30 Aug 2022 05:50:31 -0700 (PDT) From: Dylan Yudaken To: Jens Axboe , Pavel Begunkov , CC: , Dylan Yudaken Subject: [PATCH for-next v4 5/7] io_uring: move io_eventfd_put Date: Tue, 30 Aug 2022 05:50:11 -0700 Message-ID: <20220830125013.570060-6-dylany@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220830125013.570060-1-dylany@fb.com> References: <20220830125013.570060-1-dylany@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: WJm902KjIWOT_miKwvjt-gwQc1CA_D2B X-Proofpoint-GUID: WJm902KjIWOT_miKwvjt-gwQc1CA_D2B X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-30_07,2022-08-30_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Non functional change: move this function above io_eventfd_signal so it can be used from there Signed-off-by: Dylan Yudaken --- io_uring/io_uring.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 9f90b0633de7..c6e416be7ed3 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -478,6 +478,14 @@ static __cold void io_queue_deferred(struct io_ring_ctx *ctx) } } +static void io_eventfd_put(struct rcu_head *rcu) +{ + struct io_ev_fd *ev_fd = container_of(rcu, struct io_ev_fd, rcu); + + eventfd_ctx_put(ev_fd->cq_ev_fd); + kfree(ev_fd); +} + static void io_eventfd_signal(struct io_ring_ctx *ctx) { struct io_ev_fd *ev_fd; @@ -2468,14 +2476,6 @@ static int io_eventfd_register(struct io_ring_ctx *ctx, void __user *arg, return 0; } -static void io_eventfd_put(struct rcu_head *rcu) -{ - struct io_ev_fd *ev_fd = container_of(rcu, struct io_ev_fd, rcu); - - eventfd_ctx_put(ev_fd->cq_ev_fd); - kfree(ev_fd); -} - static int io_eventfd_unregister(struct io_ring_ctx *ctx) { struct io_ev_fd *ev_fd; From patchwork Tue Aug 30 12:50:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dylan Yudaken X-Patchwork-Id: 12959295 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B40EAECAAA1 for ; Tue, 30 Aug 2022 12:50:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229979AbiH3Muu (ORCPT ); Tue, 30 Aug 2022 08:50:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229982AbiH3Mut (ORCPT ); Tue, 30 Aug 2022 08:50:49 -0400 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A041EB2D8E for ; Tue, 30 Aug 2022 05:50:48 -0700 (PDT) Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27TMpLa5020037 for ; Tue, 30 Aug 2022 05:50:48 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=INkrT8gD4GI6jBOyEXw3GPXdNvWYN7HGXZSqC/8tnME=; b=PM19hIldYL4KkVWo3d0pLOnCvl7spltgyqRx9iB+MCXMg/1/5GeJg+VWKTLRK8g0UNNv VMMNqfV8zbpPfKlGxN8XSPnsiudDNznxvI0Ylyabd9wq2jmCgOQSyRDOfEDDiE5kK2Qy +PA3OGTR21B0efqcj/HYJ5Y2yoaeK0riAE0= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j94gyc393-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 30 Aug 2022 05:50:47 -0700 Received: from twshared8288.05.ash9.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 30 Aug 2022 05:50:44 -0700 Received: by devbig038.lla2.facebook.com (Postfix, from userid 572232) id 0EAB055BF515; Tue, 30 Aug 2022 05:50:31 -0700 (PDT) From: Dylan Yudaken To: Jens Axboe , Pavel Begunkov , CC: , Dylan Yudaken Subject: [PATCH for-next v4 6/7] io_uring: signal registered eventfd to process deferred task work Date: Tue, 30 Aug 2022 05:50:12 -0700 Message-ID: <20220830125013.570060-7-dylany@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220830125013.570060-1-dylany@fb.com> References: <20220830125013.570060-1-dylany@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: 5CGGXBeqc2aU1fO5gqdQh82tCf-5ElLc X-Proofpoint-GUID: 5CGGXBeqc2aU1fO5gqdQh82tCf-5ElLc X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-30_07,2022-08-30_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Some workloads rely on a registered eventfd (via io_uring_register_eventfd(3)) in order to wake up and process the io_uring. In the case of a ring setup with IORING_SETUP_DEFER_TASKRUN, that eventfd also needs to be signalled when there are tasks to run. This changes an old behaviour which assumed 1 eventfd signal implied at least 1 CQE, however only when this new flag is set (and so old users will not notice). This should be expected with the IORING_SETUP_DEFER_TASKRUN flag as it is not guaranteed that every task will result in a CQE. Signed-off-by: Dylan Yudaken --- include/linux/io_uring_types.h | 1 + io_uring/io_uring.c | 75 ++++++++++++++++++++++++---------- 2 files changed, 55 insertions(+), 21 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index d56ff2185168..42494176434a 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -184,6 +184,7 @@ struct io_ev_fd { struct eventfd_ctx *cq_ev_fd; unsigned int eventfd_async: 1; struct rcu_head rcu; + atomic_t refs; }; struct io_alloc_cache { diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index c6e416be7ed3..fffa81e498a4 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -478,33 +478,33 @@ static __cold void io_queue_deferred(struct io_ring_ctx *ctx) } } + +static inline void __io_eventfd_put(struct io_ev_fd *ev_fd) +{ + if (atomic_dec_and_test(&ev_fd->refs)) { + eventfd_ctx_put(ev_fd->cq_ev_fd); + kfree(ev_fd); + } +} + +static void io_eventfd_signal_put(struct rcu_head *rcu) +{ + struct io_ev_fd *ev_fd = container_of(rcu, struct io_ev_fd, rcu); + + eventfd_signal(ev_fd->cq_ev_fd, 1); + __io_eventfd_put(ev_fd); +} + static void io_eventfd_put(struct rcu_head *rcu) { struct io_ev_fd *ev_fd = container_of(rcu, struct io_ev_fd, rcu); - eventfd_ctx_put(ev_fd->cq_ev_fd); - kfree(ev_fd); + __io_eventfd_put(ev_fd); } static void io_eventfd_signal(struct io_ring_ctx *ctx) { - struct io_ev_fd *ev_fd; - bool skip; - - spin_lock(&ctx->completion_lock); - /* - * Eventfd should only get triggered when at least one event has been - * posted. Some applications rely on the eventfd notification count only - * changing IFF a new CQE has been added to the CQ ring. There's no - * depedency on 1:1 relationship between how many times this function is - * called (and hence the eventfd count) and number of CQEs posted to the - * CQ ring. - */ - skip = ctx->cached_cq_tail == ctx->evfd_last_cq_tail; - ctx->evfd_last_cq_tail = ctx->cached_cq_tail; - spin_unlock(&ctx->completion_lock); - if (skip) - return; + struct io_ev_fd *ev_fd = NULL; rcu_read_lock(); /* @@ -522,13 +522,43 @@ static void io_eventfd_signal(struct io_ring_ctx *ctx) goto out; if (READ_ONCE(ctx->rings->cq_flags) & IORING_CQ_EVENTFD_DISABLED) goto out; + if (ev_fd->eventfd_async && !io_wq_current_is_worker()) + goto out; - if (!ev_fd->eventfd_async || io_wq_current_is_worker()) + if (likely(eventfd_signal_allowed())) { eventfd_signal(ev_fd->cq_ev_fd, 1); + } else { + atomic_inc(&ev_fd->refs); + call_rcu(&ev_fd->rcu, io_eventfd_signal_put); + } + out: rcu_read_unlock(); } +static void io_eventfd_flush_signal(struct io_ring_ctx *ctx) +{ + bool skip; + + spin_lock(&ctx->completion_lock); + + /* + * Eventfd should only get triggered when at least one event has been + * posted. Some applications rely on the eventfd notification count + * only changing IFF a new CQE has been added to the CQ ring. There's + * no depedency on 1:1 relationship between how many times this + * function is called (and hence the eventfd count) and number of CQEs + * posted to the CQ ring. + */ + skip = ctx->cached_cq_tail == ctx->evfd_last_cq_tail; + ctx->evfd_last_cq_tail = ctx->cached_cq_tail; + spin_unlock(&ctx->completion_lock); + if (skip) + return; + + io_eventfd_signal(ctx); +} + void __io_commit_cqring_flush(struct io_ring_ctx *ctx) { if (ctx->off_timeout_used || ctx->drain_active) { @@ -540,7 +570,7 @@ void __io_commit_cqring_flush(struct io_ring_ctx *ctx) spin_unlock(&ctx->completion_lock); } if (ctx->has_evfd) - io_eventfd_signal(ctx); + io_eventfd_flush_signal(ctx); } static inline void io_cqring_ev_posted(struct io_ring_ctx *ctx) @@ -1071,6 +1101,8 @@ static void io_req_local_work_add(struct io_kiocb *req) if (ctx->flags & IORING_SETUP_TASKRUN_FLAG) atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags); + if (ctx->has_evfd) + io_eventfd_signal(ctx); io_cqring_wake(ctx); } @@ -2473,6 +2505,7 @@ static int io_eventfd_register(struct io_ring_ctx *ctx, void __user *arg, ev_fd->eventfd_async = eventfd_async; ctx->has_evfd = true; rcu_assign_pointer(ctx->io_ev_fd, ev_fd); + atomic_set(&ev_fd->refs, 1); return 0; } From patchwork Tue Aug 30 12:50:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dylan Yudaken X-Patchwork-Id: 12959296 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15074ECAAA1 for ; Tue, 30 Aug 2022 12:51:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229976AbiH3MvB (ORCPT ); Tue, 30 Aug 2022 08:51:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44120 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229682AbiH3MvA (ORCPT ); Tue, 30 Aug 2022 08:51:00 -0400 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 676A2E3988 for ; Tue, 30 Aug 2022 05:50:57 -0700 (PDT) Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27UAp6W6031201 for ; Tue, 30 Aug 2022 05:50:56 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=H8L613QkH+/6mzsHxTrmOmcDcC8HGo6hqXI+Iq604Sk=; b=NGjpRiADYvU67ygBLaTRz00IJ7DQn3tauC1nic/eTr7Qo9zkoLnK7Kfp0jleMW5WQcWJ MrcWg/I/zCC8euOiyEfoV4NS8OEn4jb2y+Dx/P6nCGfSiMWAhhSZ5FFNb49tXrdnkYzv DZVUeXjScCEC1ytNbTNTScBPFfBcjH809qk= Received: from mail.thefacebook.com ([163.114.132.120]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3j9h5dghyq-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 30 Aug 2022 05:50:56 -0700 Received: from twshared0646.06.ash9.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c085:11d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Tue, 30 Aug 2022 05:50:54 -0700 Received: by devbig038.lla2.facebook.com (Postfix, from userid 572232) id 16C8755BF517; Tue, 30 Aug 2022 05:50:31 -0700 (PDT) From: Dylan Yudaken To: Jens Axboe , Pavel Begunkov , CC: , Dylan Yudaken Subject: [PATCH for-next v4 7/7] io_uring: trace local task work run Date: Tue, 30 Aug 2022 05:50:13 -0700 Message-ID: <20220830125013.570060-8-dylany@fb.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220830125013.570060-1-dylany@fb.com> References: <20220830125013.570060-1-dylany@fb.com> MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: 8ye2-u6WZ5oKNcNXO0u2GabycwuCMw5e X-Proofpoint-ORIG-GUID: 8ye2-u6WZ5oKNcNXO0u2GabycwuCMw5e X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-30_07,2022-08-30_01,2022-06-22_01 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Add tracing for io_run_local_task_work Signed-off-by: Dylan Yudaken --- include/trace/events/io_uring.h | 29 +++++++++++++++++++++++++++++ io_uring/io_uring.c | 3 +++ 2 files changed, 32 insertions(+) diff --git a/include/trace/events/io_uring.h b/include/trace/events/io_uring.h index c5b21ff0ac85..936fd41bf147 100644 --- a/include/trace/events/io_uring.h +++ b/include/trace/events/io_uring.h @@ -655,6 +655,35 @@ TRACE_EVENT(io_uring_short_write, __entry->wanted, __entry->got) ); +/* + * io_uring_local_work_run - ran ring local task work + * + * @tctx: pointer to a io_uring_ctx + * @count: how many functions it ran + * @loops: how many loops it ran + * + */ +TRACE_EVENT(io_uring_local_work_run, + + TP_PROTO(void *ctx, int count, unsigned int loops), + + TP_ARGS(ctx, count, loops), + + TP_STRUCT__entry ( + __field(void *, ctx ) + __field(int, count ) + __field(unsigned int, loops ) + ), + + TP_fast_assign( + __entry->ctx = ctx; + __entry->count = count; + __entry->loops = loops; + ), + + TP_printk("ring %p, count %d, loops %u", __entry->ctx, __entry->count, __entry->loops) +); + #endif /* _TRACE_IO_URING_H */ /* This part must be outside protection */ diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index fffa81e498a4..cdd8d10e9638 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1165,6 +1165,7 @@ int io_run_local_work(struct io_ring_ctx *ctx) struct llist_node fake; struct llist_node *current_final = NULL; int ret; + unsigned int loops = 1; if (unlikely(ctx->submitter_task != current)) { /* maybe this is before any submissions */ @@ -1194,6 +1195,7 @@ int io_run_local_work(struct io_ring_ctx *ctx) node = io_llist_cmpxchg(&ctx->work_llist, &fake, NULL); if (node != &fake) { + loops++; current_final = &fake; node = io_llist_xchg(&ctx->work_llist, &fake); goto again; @@ -1203,6 +1205,7 @@ int io_run_local_work(struct io_ring_ctx *ctx) io_submit_flush_completions(ctx); mutex_unlock(&ctx->uring_lock); } + trace_io_uring_local_work_run(ctx, ret, loops); return ret; }