From patchwork Tue Jul 11 20:33:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13309356 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66DE0C001DE for ; Tue, 11 Jul 2023 20:33:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229782AbjGKUdg (ORCPT ); Tue, 11 Jul 2023 16:33:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44808 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229884AbjGKUdd (ORCPT ); Tue, 11 Jul 2023 16:33:33 -0400 Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com [IPv6:2607:f8b0:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 216DB170B for ; Tue, 11 Jul 2023 13:33:33 -0700 (PDT) Received: by mail-pf1-x434.google.com with SMTP id d2e1a72fcca58-66d6a9851f3so979242b3a.0 for ; Tue, 11 Jul 2023 13:33:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689107612; x=1691699612; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7v97jP75G3VwqqaqjV+oC6sPS3RZEc8vaRf0s3+D/gs=; b=SKxCWLIvvazG4YGBjUsfWOM8s8h5mAXiIzp6mRiKRMhTC6fy9lP6a1x476z0fc7I5X UEaKtqJQ68ELCM+hVDG8qJ0ZEDJ2OlZbndwUqP1HAIdvVams6QKYWAxGEe9sv80jeA6s oejF779Hu5OCp8JwTGHcDPfwo4i2WwmP4uO+4vIKRq//FW0K8s8KpKrVRmeNIqjwLpae 08zF967BVdEjjxkMZ2PCDdkhVcEpMNgRqAQWF2VA5iyT6Ywr0dQKB9/dinCBoTewq71w zzTKP9YW89BeeYJZRA7EixUds8407oiM7ysM8turbt5nckBTywDVguV3XiM3ujNdAAEB MeoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689107612; x=1691699612; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7v97jP75G3VwqqaqjV+oC6sPS3RZEc8vaRf0s3+D/gs=; b=Bk0wTAwCEN5f7noxmj2CcOlV31W6i4WeK3UkBBcCc6aljG87G2dX98PIqGAeOR3cnm J+T3/zJY5JrSMRS9WpjTKhBS7TcQGyuhFfTCCpH6vAj0N67uoSTZp54AHPz20JlWlpEV 2cH9n9b+XIEfuKKMDjqBgjCHD+7rXNFOAqs8/sBokckUOYFilfw2C6H91Ac/Oq4G0Ina LIj1fyLMFctWgGuuYJvvFzWoxZ9e+QMCSycrRVM8bc87ffNDsXrTqwa6CvmPhbfd/bzK apwxu0BamKrVB6Wh8fpxJxo0Etvs0/i+SKv1apDj8pinaQiSdo31pFay2FPFsF4mlp4q sNEg== X-Gm-Message-State: ABy/qLbcNgDLFG69IqtEtqHooV0pEY1JH6AzxLj2tBHQOXTHztrBXfaH FvhSUkjyv6EkPthy9NsIrOq1IOrjPBnHedJSnLM= X-Google-Smtp-Source: APBJJlEICKTVTh1X1fQ6strVzMxv0XkAE3MjS2/04VOWMiSDEOodeDA0AM2Bgry1eJUu7HHw1QYL9g== X-Received: by 2002:a05:6a00:3387:b0:675:8627:a291 with SMTP id cm7-20020a056a00338700b006758627a291mr16779088pfb.3.1689107612080; Tue, 11 Jul 2023 13:33:32 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id fk13-20020a056a003a8d00b0067903510abbsm2108081pfb.163.2023.07.11.13.33.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 13:33:31 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, Jens Axboe Subject: [PATCH 1/5] iomap: complete polled writes inline Date: Tue, 11 Jul 2023 14:33:21 -0600 Message-Id: <20230711203325.208957-2-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230711203325.208957-1-axboe@kernel.dk> References: <20230711203325.208957-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Polled IO is always reaped in the context of the process itself, so it does not need to be punted to a workqueue for the completion. This is different than IRQ driven IO, where iomap_dio_bio_end_io() will be invoked from hard/soft IRQ context. For those cases we currently need to punt to a workqueue for further processing. For the polled case, since it's the task itself reaping completions, we're already in task context. That makes it identical to the sync completion case. Testing a basic QD 1..8 dio random write with polled IO with the following fio job: fio --name=polled-dio-write --filename=/data1/file --time_based=1 \ --runtime=10 --bs=4096 --rw=randwrite --norandommap --buffered=0 \ --cpus_allowed=4 --ioengine=io_uring --iodepth=$depth --hipri=1 yields: Stock Patched Diff ======================================= QD1 180K 201K +11% QD2 356K 394K +10% QD4 608K 650K +7% QD8 827K 831K +0.5% which shows a nice win, particularly for lower queue depth writes. This is expected, as higher queue depths will be busy polling completions while the offloaded workqueue completions can happen in parallel. Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index ea3b868c8355..343bde5d50d3 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -161,15 +161,16 @@ void iomap_dio_bio_end_io(struct bio *bio) struct task_struct *waiter = dio->submit.waiter; WRITE_ONCE(dio->submit.waiter, NULL); blk_wake_io_task(waiter); - } else if (dio->flags & IOMAP_DIO_WRITE) { + } else if ((bio->bi_opf & REQ_POLLED) || + !(dio->flags & IOMAP_DIO_WRITE)) { + WRITE_ONCE(dio->iocb->private, NULL); + iomap_dio_complete_work(&dio->aio.work); + } else { struct inode *inode = file_inode(dio->iocb->ki_filp); WRITE_ONCE(dio->iocb->private, NULL); INIT_WORK(&dio->aio.work, iomap_dio_complete_work); queue_work(inode->i_sb->s_dio_done_wq, &dio->aio.work); - } else { - WRITE_ONCE(dio->iocb->private, NULL); - iomap_dio_complete_work(&dio->aio.work); } } From patchwork Tue Jul 11 20:33:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13309357 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4291C001DC for ; Tue, 11 Jul 2023 20:33:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230286AbjGKUdh (ORCPT ); Tue, 11 Jul 2023 16:33:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44828 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229770AbjGKUdf (ORCPT ); Tue, 11 Jul 2023 16:33:35 -0400 Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 751281709 for ; Tue, 11 Jul 2023 13:33:34 -0700 (PDT) Received: by mail-pf1-x42b.google.com with SMTP id d2e1a72fcca58-682a5465e9eso980080b3a.1 for ; Tue, 11 Jul 2023 13:33:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689107613; x=1691699613; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8G1ATQFuI41fkL8nxWaXO/6T7p7QjpNqJTfODI4BURk=; b=ruLMvlGVd99hIimX09o4jOpO7XRLli4f19K5s0UL/kg8kU/nXFBG5TEdysN/kw2pxK L4yLcMmsezywupoQo+SWOuZCBlKvHlZwCxZOMWQiRWGZpl7qGfD06s7KXFBJXrquzx2+ 3cno7S1zlf63a6WwoyH8p/bSYzt2/Ksw2ZHyF/D7pZyw8sPTpO+5iOTYSsfbEQtEzdF/ ojN/5s5mAlbA8+DaMbIg71FzV1AiL5HnVyP5eNjCpBnXO5g0y9FqU2pzTZuxFkT60uJ9 a2Nb3VcYgOqxaTFSaZWdh4Ki154Nipci5q740nlVZTKt6PCC0BGcp5xg9oT3eOG0U2lb eV3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689107613; x=1691699613; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8G1ATQFuI41fkL8nxWaXO/6T7p7QjpNqJTfODI4BURk=; b=fbXBYymXTlmb3jABVjCbjZvDZmITTtTtQBn+0Bni2TCh3IIGugJpV62+9I1WDDDynC WJjQyuu9mOZB2CTvtZwr4gt8O4s+Efjq3CNU2Gu1cbVzTHJ56twAAEQ6wyQLzttqwi8l NLv7OUWWrdwnq7G4c6siX2+luD4e6M+CNZfgml4G7520Mb2rkbxbwy7mRyLtbtsGP/VZ wZs6RVWopyLkkEDUFZ946U1Y+VleBkDfqlNFVzJWQRVtmzsQwBrlh0HfbfyzfEMdL8Xl 0Cd2SRRm/w5/KxxjkNkmNp0nmbX95Rm6d7qtXh21/IcRAKguLXEccKW6agSBWYh7ex9k H+ww== X-Gm-Message-State: ABy/qLYivtUmVgxtTEEmxdttQhEzmPgb9Utde/Og1qCFNWSMmFl41eDF 4WhPBeBP+qf95vB6AvLJaUg9zxZdzBuVE4Yl0AU= X-Google-Smtp-Source: APBJJlHzKXTNBJSugfp1HnJcJwK1C7ev1+Bvxj1jMKRytCnxsp6FqaxILGesdp1hHtLzGsNtRuKcPw== X-Received: by 2002:a05:6a20:3ca7:b0:11a:efaa:eb88 with SMTP id b39-20020a056a203ca700b0011aefaaeb88mr21693558pzj.3.1689107613516; Tue, 11 Jul 2023 13:33:33 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id fk13-20020a056a003a8d00b0067903510abbsm2108081pfb.163.2023.07.11.13.33.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 13:33:32 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, Jens Axboe Subject: [PATCH 2/5] fs: add IOCB flags related to passing back dio completions Date: Tue, 11 Jul 2023 14:33:22 -0600 Message-Id: <20230711203325.208957-3-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230711203325.208957-1-axboe@kernel.dk> References: <20230711203325.208957-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Async dio completions generally happen from hard/soft IRQ context, which means that users like iomap may need to defer some of the completion handling to a workqueue. This is less efficient than having the original issuer handle it, like we do for sync IO, and it adds latency to the completions. Add IOCB_DIO_DEFER, which the issuer can set if it is able to safely punt these completions to a safe context. If the dio handler is aware of this flag, assign a callback handler in kiocb->dio_complete and associated data io kiocb->private. The issuer will then call this handler with that data from task context. No functional changes in this patch. Signed-off-by: Jens Axboe --- include/linux/fs.h | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 6867512907d6..115382f66d79 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -338,6 +338,16 @@ enum rw_hint { #define IOCB_NOIO (1 << 20) /* can use bio alloc cache */ #define IOCB_ALLOC_CACHE (1 << 21) +/* + * IOCB_DIO_DEFER can be set by the iocb owner, to indicate that the + * iocb completion can be passed back to the owner for execution from a safe + * context rather than needing to be punted through a workqueue. If this + * flag is set, the completion handling may set iocb->dio_complete to a + * handler, which the issuer will then call from task context to complete + * the processing of the iocb. iocb->private should then also be set to + * the argument being passed to this handler. + */ +#define IOCB_DIO_DEFER (1 << 22) /* for use in trace events */ #define TRACE_IOCB_STRINGS \ @@ -351,7 +361,8 @@ enum rw_hint { { IOCB_WRITE, "WRITE" }, \ { IOCB_WAITQ, "WAITQ" }, \ { IOCB_NOIO, "NOIO" }, \ - { IOCB_ALLOC_CACHE, "ALLOC_CACHE" } + { IOCB_ALLOC_CACHE, "ALLOC_CACHE" }, \ + { IOCB_DIO_DEFER, "DIO_DEFER" } struct kiocb { struct file *ki_filp; @@ -360,7 +371,22 @@ struct kiocb { void *private; int ki_flags; u16 ki_ioprio; /* See linux/ioprio.h */ - struct wait_page_queue *ki_waitq; /* for async buffered IO */ + union { + /* + * Only used for async buffered reads, where it denotes the + * page waitqueue associated with completing the read. Valid + * IFF IOCB_WAITQ is set. + */ + struct wait_page_queue *ki_waitq; + /* + * Can be used for O_DIRECT IO, where the completion handling + * is punted back to the issuer of the IO. May only be set + * if IOCB_DIO_DEFER is set by the issuer, and the issuer must + * then check for presence of this handler when ki_complete is + * invoked. + */ + ssize_t (*dio_complete)(void *data); + }; }; static inline bool is_sync_kiocb(struct kiocb *kiocb) From patchwork Tue Jul 11 20:33:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13309358 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C979C04A6A for ; Tue, 11 Jul 2023 20:33:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229770AbjGKUdi (ORCPT ); Tue, 11 Jul 2023 16:33:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44824 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229888AbjGKUdg (ORCPT ); Tue, 11 Jul 2023 16:33:36 -0400 Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com [IPv6:2607:f8b0:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E66891710 for ; Tue, 11 Jul 2023 13:33:35 -0700 (PDT) Received: by mail-pf1-x436.google.com with SMTP id d2e1a72fcca58-682eef7d752so840245b3a.0 for ; Tue, 11 Jul 2023 13:33:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689107615; x=1689712415; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Rd5q4QGxS59bSAw+JDmQ90OVbkUJjAQIgvePrWiPffo=; b=GDFBVXHrRbHPsPVU4DCmRN6oNIWp52f4jpnmeWVWuO5m92w0TVuR59FJTTN+ga1vtS CDq7NW98kbb95PEhjGTPtMyAZ00nxb4mMve2wzye7sZ2Rgos0LiFLINEeUtipY7/lT3C zxQJbjNL29IIqWIaMbK/uCju3tUA3gmNKiaaQABjaNlVJO/0oCvZA3bDCqBk8h+J9iTC 8w2MnI3/l08WbdKv5OTadqfBMma7qlYKK+n7F5Yzrs/rLEOPJPjW/ZMJngZ8uS64+h6m 9Zv8jBnVRjjg6+bDrfwAd5q4S7p/RFoLs2h4rymbI22cMnbzzvS/A+hx4fdBQ/1hm/1r Uy6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689107615; x=1689712415; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Rd5q4QGxS59bSAw+JDmQ90OVbkUJjAQIgvePrWiPffo=; b=Vvn8RP1wMTZ8pyHqFXsBwxnSlRgwL+yejt6H8CiNB/6KQ3oFkPcFo9QKXqsmy440Rb 817nByME2jHKSGWvupQl7yyWd6ihvvKymuUchsGYKgOnmsDFVuV+OYaqX6Dq9MUkYFN3 1dZI6hgcGJygma4Xqcc1wx9S9gLa00D4374C2UrAkevgc1FH+lv9DLjuwU942L2l/jSl GSOt+A4xtrFbuTu7HD2DEWVs8kApYpNyWJwB42BjvlgfVSS7ODemPze5TWKRWPpfuTzR h2Oo2x7u9sk1pguDBK0BzYf5qFPSR/ivbLRX3TwFDuVoKyYr2Fip9S51Ic4X4bYzJqsK 1Mxw== X-Gm-Message-State: ABy/qLYkFsMM3y5U0EIBzKGsvspWQlsucYfLana4plMJ5+rw+QP9/Rwf F63eLkJE3Xjc5+NL+rBk6YNscpp/DmT7FK1DAvs= X-Google-Smtp-Source: APBJJlHh4R8H8y0q7l/+iJSpHFtPMwjZxdkfJxKb3+C34QPTz5vRIWajkTtepebKwCc5RmqimcURtw== X-Received: by 2002:a05:6a20:7da6:b0:12f:dce2:b381 with SMTP id v38-20020a056a207da600b0012fdce2b381mr20396148pzj.3.1689107614789; Tue, 11 Jul 2023 13:33:34 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id fk13-20020a056a003a8d00b0067903510abbsm2108081pfb.163.2023.07.11.13.33.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 13:33:34 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, Jens Axboe Subject: [PATCH 3/5] io_uring/rw: add write support for IOCB_DIO_DEFER Date: Tue, 11 Jul 2023 14:33:23 -0600 Message-Id: <20230711203325.208957-4-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230711203325.208957-1-axboe@kernel.dk> References: <20230711203325.208957-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org If the filesystem dio handler understands IOCB_DIO_DEFER, we'll get a kiocb->ki_complete() callback with kiocb->dio_complete set. In that case, rather than complete the IO directly through task_work, queue up an intermediate task_work handler that first processes this callback and then immediately completes the request. For XFS, this avoids a punt through a workqueue, which is a lot less efficient and adds latency to lower queue depth (or sync) O_DIRECT writes. Signed-off-by: Jens Axboe --- io_uring/rw.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/io_uring/rw.c b/io_uring/rw.c index 1bce2208b65c..4ed378c70249 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -285,6 +285,14 @@ static inline int io_fixup_rw_res(struct io_kiocb *req, long res) void io_req_rw_complete(struct io_kiocb *req, struct io_tw_state *ts) { + struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); + + if (rw->kiocb.dio_complete) { + long res = rw->kiocb.dio_complete(rw->kiocb.private); + + io_req_set_res(req, io_fixup_rw_res(req, res), 0); + } + io_req_io_end(req); if (req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)) { @@ -300,9 +308,11 @@ static void io_complete_rw(struct kiocb *kiocb, long res) struct io_rw *rw = container_of(kiocb, struct io_rw, kiocb); struct io_kiocb *req = cmd_to_io_kiocb(rw); - if (__io_complete_rw_common(req, res)) - return; - io_req_set_res(req, io_fixup_rw_res(req, res), 0); + if (!rw->kiocb.dio_complete) { + if (__io_complete_rw_common(req, res)) + return; + io_req_set_res(req, io_fixup_rw_res(req, res), 0); + } req->io_task_work.func = io_req_rw_complete; __io_req_task_work_add(req, IOU_F_TWQ_LAZY_WAKE); } @@ -914,7 +924,13 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags) __sb_writers_release(file_inode(req->file)->i_sb, SB_FREEZE_WRITE); } - kiocb->ki_flags |= IOCB_WRITE; + + /* + * Set IOCB_DIO_DEFER, stating that our handler groks deferring the + * completion to task context. + */ + kiocb->ki_flags |= IOCB_WRITE | IOCB_DIO_DEFER; + kiocb->dio_complete = NULL; if (likely(req->file->f_op->write_iter)) ret2 = call_write_iter(req->file, kiocb, &s->iter); From patchwork Tue Jul 11 20:33:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13309359 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71CFEEB64DC for ; Tue, 11 Jul 2023 20:33:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230257AbjGKUdi (ORCPT ); Tue, 11 Jul 2023 16:33:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230308AbjGKUdh (ORCPT ); Tue, 11 Jul 2023 16:33:37 -0400 Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com [IPv6:2607:f8b0:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 03F96B7 for ; Tue, 11 Jul 2023 13:33:37 -0700 (PDT) Received: by mail-pf1-x436.google.com with SMTP id d2e1a72fcca58-66d6a9851f3so979261b3a.0 for ; Tue, 11 Jul 2023 13:33:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689107616; x=1691699616; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=K4lQMN8Le/C1jdSB9yTI2JvyEXlu5Hktu+P+yNzyi4U=; b=mrcVeQl9YidPXEIu0qrh18WlOJZDdWABWgO6kgrZ0XmOagDqRVENP+qucVgSkt0anB +pzV8VXABn00CAVK3nnZMcHXaOdjUl2w7xmMgqnmWkndLe0qc6+gSZCODFghShrs5bRf 36qWmIuSDuFGFtg89nYValQI+DF8vXfGPo4qgnSocLzbH1rDTipwGSA/KIVd5yiL3jDh BwJioibL7LoWa761AFiHPra/rVJLfaiZ0Ji4vvd4xLWe7gEVofVPknUDeExtPHttAoPg EubpbxwUozHnSrcrOowG8jDm2DWhVo6O2YK5O4KEuvCaAKvOlbrQV/HKlzNoHPz73RgN b+OQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689107616; x=1691699616; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=K4lQMN8Le/C1jdSB9yTI2JvyEXlu5Hktu+P+yNzyi4U=; b=T85VnB6apZ0nP++XpyZav652z9E39KDCdm9iHhL+jq47zBDJfyxGoMk3ajcOy+z6xL pYoTCqrYnB4VCE6T0tgA7sw8NWOewvkbCpXyFsjw2E/GRq9BHnqdnrR/uVy5ZlmLNvP7 YAVir1x0lMjbw8rRJNIOODt/epMQqGFuQYR/Ux6KRC6QfSWdjBUWPr5nsxFgn4d2kgoS wEe2/LUnuRKBP+hS/bCQOqmxyNHcgpqNqL4mYsnMEIjTroRtNfav15MDGPFgN6t4PNkM KWyTtM1vdb27ycpEJfoloWsv3kVdFwibaz6sWcVAMEApR1VtndkdxoEwfhqvjxXLcFPu IK+Q== X-Gm-Message-State: ABy/qLaaXERuT+57JvUZYECy7BGvwlR1BTpAPEvgSln+c22xBYuTfXIG FxqraxFGc68kNIERQH5aiVJ2dICnRkWoZUgzllg= X-Google-Smtp-Source: APBJJlHGZD2pToOKcRdBqvrpeDqGNCC7/KU1/F7yZ4nORGQegCZgWoU5k+zvAQYTDatQpmPxq6XajQ== X-Received: by 2002:a05:6a20:8e2a:b0:130:9af7:bf1 with SMTP id y42-20020a056a208e2a00b001309af70bf1mr16939638pzj.6.1689107616082; Tue, 11 Jul 2023 13:33:36 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id fk13-20020a056a003a8d00b0067903510abbsm2108081pfb.163.2023.07.11.13.33.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 13:33:35 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, Jens Axboe Subject: [PATCH 4/5] iomap: add local 'iocb' variable in iomap_dio_bio_end_io() Date: Tue, 11 Jul 2023 14:33:24 -0600 Message-Id: <20230711203325.208957-5-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230711203325.208957-1-axboe@kernel.dk> References: <20230711203325.208957-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org We use this multiple times, add a local variable for the kiocb. Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 343bde5d50d3..94ef78b25b76 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -157,18 +157,20 @@ void iomap_dio_bio_end_io(struct bio *bio) iomap_dio_set_error(dio, blk_status_to_errno(bio->bi_status)); if (atomic_dec_and_test(&dio->ref)) { + struct kiocb *iocb = dio->iocb; + if (dio->wait_for_completion) { struct task_struct *waiter = dio->submit.waiter; WRITE_ONCE(dio->submit.waiter, NULL); blk_wake_io_task(waiter); } else if ((bio->bi_opf & REQ_POLLED) || !(dio->flags & IOMAP_DIO_WRITE)) { - WRITE_ONCE(dio->iocb->private, NULL); + WRITE_ONCE(iocb->private, NULL); iomap_dio_complete_work(&dio->aio.work); } else { - struct inode *inode = file_inode(dio->iocb->ki_filp); + struct inode *inode = file_inode(iocb->ki_filp); - WRITE_ONCE(dio->iocb->private, NULL); + WRITE_ONCE(iocb->private, NULL); INIT_WORK(&dio->aio.work, iomap_dio_complete_work); queue_work(inode->i_sb->s_dio_done_wq, &dio->aio.work); } From patchwork Tue Jul 11 20:33:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13309360 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28FF9C0015E for ; Tue, 11 Jul 2023 20:33:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229637AbjGKUdk (ORCPT ); Tue, 11 Jul 2023 16:33:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230397AbjGKUdj (ORCPT ); Tue, 11 Jul 2023 16:33:39 -0400 Received: from mail-pg1-x535.google.com (mail-pg1-x535.google.com [IPv6:2607:f8b0:4864:20::535]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 56125B7 for ; Tue, 11 Jul 2023 13:33:38 -0700 (PDT) Received: by mail-pg1-x535.google.com with SMTP id 41be03b00d2f7-55b5a37acb6so540036a12.0 for ; Tue, 11 Jul 2023 13:33:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689107617; x=1689712417; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ixQUY57vnvYb5HvRH5ijyFihp5oJ/szTZ8856/uqLoU=; b=sI/EhwbO/KOMFb7uDzGESwMGWUkuorv7KqjZikEwkCW+0t36HgzSjPtaR7itUyd0M2 SHnZ2gCDDMoHxCeO9HB1Z3MzwkTyeF3jBmz2F4QHoGwfa0yvgQrZa79pbXDxtYyVob1K HRAR0y/cbmJb62fG8vH5iKRyUuXhtWrVOD6nTA9i8SFhSBZnGn4GYmBRSbx2gsBy5CJk fNIK5hQGgLik6/2F4YzjVa829WpmPNBrQhYDN6PFnVLR2ArPoS7Eyn6d7Q3inkHCFF2U /AVXYeUiEnGnKC1NY6/qsKYspwRS4kDipl++OGWF0KYDplasu3M6dlE0DwIAOzX8z2j/ hkSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689107617; x=1689712417; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ixQUY57vnvYb5HvRH5ijyFihp5oJ/szTZ8856/uqLoU=; b=dAHeGrrTt5AP4YjBlu8alSUuTX8U0c2M7XPyjDib/tVzqaExv1JUsBdHf4l1DVoO5S NPj0CvOHwMxiOLDXeTPemQ7GU/yhNhH5uzXAB/6L2cPp6/xrAoJWRzyUGNnlnT6BacT6 V3QSTqNJDo2JfFnmTz57cKg4IENLlEju34h5XtbGfnXNcm+r5Hqazo0g+QJ92O0wxAlp HLx+NxGffJNDLJkwnO0f/CpGSMlYLOLCGPrP6r8FxPnV5qMCNpCYvn2FIU8NVmWl9aM9 sAcO3Ue4bhc30xHZdViMkj4XIZTXVR0db6N4dJBy+suFQNMwfeDGGegFBRJ6Gs9ctefU qBow== X-Gm-Message-State: ABy/qLaehiDjbxbaAs7xVHvoQK3LMFUzHqwiT/uA4GiwCBck2N5VEvcm td7cjKo3RmZl2Y5pwtvqs2A0IocZy3wl4ViixRw= X-Google-Smtp-Source: APBJJlE0Fdjl9niH1fLBogzYDLzZpwQEuTCuENjw7d/RyTpMB94Vv2knTegxp7PH5nQo9z3SCc2jMg== X-Received: by 2002:a05:6a20:4289:b0:12c:76d1:bcde with SMTP id o9-20020a056a20428900b0012c76d1bcdemr23311409pzj.4.1689107617361; Tue, 11 Jul 2023 13:33:37 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id fk13-20020a056a003a8d00b0067903510abbsm2108081pfb.163.2023.07.11.13.33.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 13:33:36 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, Jens Axboe Subject: [PATCH 5/5] iomap: support IOCB_DIO_DEFER Date: Tue, 11 Jul 2023 14:33:25 -0600 Message-Id: <20230711203325.208957-6-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230711203325.208957-1-axboe@kernel.dk> References: <20230711203325.208957-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org If IOCB_DIO_DEFER is set, utilize that to set kiocb->dio_complete handler and data for that callback. Rather than punt the completion to a workqueue, we pass back the handler and data to the issuer and will get a callback from a safe task context. Using the following fio job to randomly dio write 4k blocks at queue depths of 1..16: fio --name=dio-write --filename=/data1/file --time_based=1 \ --runtime=10 --bs=4096 --rw=randwrite --norandommap --buffered=0 \ --cpus_allowed=4 --ioengine=io_uring --iodepth=16 shows the following results before and after this patch: Stock Patched Diff ======================================= QD1 155K 162K + 4.5% QD2 290K 313K + 7.9% QD4 533K 597K +12.0% QD8 604K 827K +36.9% QD16 615K 845K +37.4% which shows nice wins all around. If we factored in per-IOP efficiency, the wins look even nicer. This becomes apparent as queue depth rises, as the offloaded workqueue completions runs out of steam. Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 94ef78b25b76..bd7b948a29a7 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -130,6 +130,11 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio) } EXPORT_SYMBOL_GPL(iomap_dio_complete); +static ssize_t iomap_dio_deferred_complete(void *data) +{ + return iomap_dio_complete(data); +} + static void iomap_dio_complete_work(struct work_struct *work) { struct iomap_dio *dio = container_of(work, struct iomap_dio, aio.work); @@ -167,6 +172,25 @@ void iomap_dio_bio_end_io(struct bio *bio) !(dio->flags & IOMAP_DIO_WRITE)) { WRITE_ONCE(iocb->private, NULL); iomap_dio_complete_work(&dio->aio.work); + } else if ((iocb->ki_flags & IOCB_DIO_DEFER) && + !(dio->flags & IOMAP_DIO_NEED_SYNC)) { + /* only polled IO cares about private cleared */ + iocb->private = dio; + iocb->dio_complete = iomap_dio_deferred_complete; + /* + * Invoke ->ki_complete() directly. We've assigned + * out dio_complete callback handler, and since the + * issuer set IOCB_DIO_DEFER, we know their + * ki_complete handler will notice ->dio_complete + * being set and will defer calling that handler + * until it can be done from a safe task context. + * + * Note that the 'res' being passed in here is + * not important for this case. The actual completion + * value of the request will be gotten from dio_complete + * when that is run by the issuer. + */ + iocb->ki_complete(iocb, 0); } else { struct inode *inode = file_inode(iocb->ki_filp);