From patchwork Tue Jul 18 19:49:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13317666 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A9C1C04A6A for ; Tue, 18 Jul 2023 19:49:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230232AbjGRTt3 (ORCPT ); Tue, 18 Jul 2023 15:49:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34844 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230306AbjGRTt2 (ORCPT ); Tue, 18 Jul 2023 15:49:28 -0400 Received: from mail-io1-xd30.google.com (mail-io1-xd30.google.com [IPv6:2607:f8b0:4864:20::d30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 32321198E for ; Tue, 18 Jul 2023 12:49:27 -0700 (PDT) Received: by mail-io1-xd30.google.com with SMTP id ca18e2360f4ac-785d3a53ed6so68141039f.1 for ; Tue, 18 Jul 2023 12:49:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689709766; x=1690314566; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=w5RGf/AztcSIhP5DFZEGlQk10jAnPRlynSLtijCMzZ4=; b=uVjqrRfgxlo7M0RXR2cj9GYEPSnJK1B0Cakesm9KzrHqbTPDquGVMOH2v+pW5Zoet7 YiXgI4yD5/+0BjFGkJfHmYo41V3EprTN5u6k+PaCUi/Hh4MGtP9/60v7lH5cfbXIQgQF b1eA0rz5lC4NJxDyJvbeVVERhm0KhTgl8viOCZEbLDLnaBN4y3fFGV1pnd8I7hnQVMvV fgwzvB0QCRYwvKtToaWISYnvszMayn8bOZqRKFiqLyVRi6nc5Aa+UPWKmQy/HJNSWp06 vxiA7Hj+0RVAZB+ps4xlvGOnked0yRgrUMBnjdzY6SvJ8DCllEBv0rm71QJuzEkRGfCL RHNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689709766; x=1690314566; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=w5RGf/AztcSIhP5DFZEGlQk10jAnPRlynSLtijCMzZ4=; b=BRtGOb/9sXpFMokPHwA40x6ZqVeUWjkPB4S7JbiRSlR+TEv24rNV3oI7UKBHmD5/nO mMnesEK6fU/iuwWx/lXZY0VFCbSO75aOOS6VXSKUzWZ4cqJP0Fphy4K/CvoyHMi97BzZ 5m4QGR6jLgj5l08KmhoBJDtBycZv9uo2Dyc8nv0jnPOPv9BXwqoznayf7yk/tBXaGmmd WiHjTRL5ZbNK/RsqblIUhjkpDdpuMmqjpgS7TxRAFuBJFLRYnKmr8tfOIw/tLovgx5vT vDJsm4H6MbtYiZfZK6BL0p9jU1TW8XG7WIysqUsHmsvXX+GQvMzQoFZFaWai67qt2LL/ V1gg== X-Gm-Message-State: ABy/qLbOKIbR7JLfvEps6BQMbp+nsbfTKRw4p5W5TnAXL3wwgoyxMkjU GUVCnD5KyWEYCv3dCL9Vu0F6Sg== X-Google-Smtp-Source: APBJJlHNrBPrb6/NTxNFIMVviy6/Vrwwn2y5BNkB9XpeWA57DlLQVQwu8YRln6Tnq7dnCAYSYHaALg== X-Received: by 2002:a05:6e02:219b:b0:345:a3d0:f0d4 with SMTP id j27-20020a056e02219b00b00345a3d0f0d4mr3704541ila.3.1689709766509; Tue, 18 Jul 2023 12:49:26 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id v18-20020a92d252000000b00345e3a04f2dsm897463ilg.62.2023.07.18.12.49.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Jul 2023 12:49:25 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 1/5] iomap: simplify logic for when a dio can get completed inline Date: Tue, 18 Jul 2023 13:49:16 -0600 Message-Id: <20230718194920.1472184-3-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230718194920.1472184-1-axboe@kernel.dk> References: <20230718194920.1472184-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Currently iomap gates this on !IOMAP_DIO_WRITE, but this isn't entirely accurate. Some writes can complete just fine inline. One such example is polled IO, where the completion always happens in task context. Add IOMAP_DIO_INLINE_COMP which tells the completion side if we can complete this dio inline, or if it needs punting to a workqueue. We set this flag by default for any dio, and turn it off for unwritten extents or blocks that require a sync at completion time. Gate the inline completion on whether we're in a task or not as well. This will always be true for polled IO, but for IRQ driven IO, the completion context may not allow for inline completions. Testing a basic QD 1..8 dio random write with polled IO with the following fio job: fio --name=polled-dio-write --filename=/data1/file --time_based=1 \ --runtime=10 --bs=4096 --rw=randwrite --norandommap --buffered=0 \ --cpus_allowed=4 --ioengine=io_uring --iodepth=$depth --hipri=1 yields: Stock Patched Diff ======================================= QD1 180K 201K +11% QD2 356K 394K +10% QD4 608K 650K +7% QD8 827K 831K +0.5% which shows a nice win, particularly for lower queue depth writes. This is expected, as higher queue depths will be busy polling completions while the offloaded workqueue completions can happen in parallel. Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index ea3b868c8355..6fa77094cf0a 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -20,6 +20,7 @@ * Private flags for iomap_dio, must not overlap with the public ones in * iomap.h: */ +#define IOMAP_DIO_INLINE_COMP (1 << 27) #define IOMAP_DIO_WRITE_FUA (1 << 28) #define IOMAP_DIO_NEED_SYNC (1 << 29) #define IOMAP_DIO_WRITE (1 << 30) @@ -161,15 +162,15 @@ void iomap_dio_bio_end_io(struct bio *bio) struct task_struct *waiter = dio->submit.waiter; WRITE_ONCE(dio->submit.waiter, NULL); blk_wake_io_task(waiter); - } else if (dio->flags & IOMAP_DIO_WRITE) { + } else if ((dio->flags & IOMAP_DIO_INLINE_COMP) && in_task()) { + WRITE_ONCE(dio->iocb->private, NULL); + iomap_dio_complete_work(&dio->aio.work); + } else { struct inode *inode = file_inode(dio->iocb->ki_filp); WRITE_ONCE(dio->iocb->private, NULL); INIT_WORK(&dio->aio.work, iomap_dio_complete_work); queue_work(inode->i_sb->s_dio_done_wq, &dio->aio.work); - } else { - WRITE_ONCE(dio->iocb->private, NULL); - iomap_dio_complete_work(&dio->aio.work); } } @@ -244,6 +245,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, if (iomap->type == IOMAP_UNWRITTEN) { dio->flags |= IOMAP_DIO_UNWRITTEN; + dio->flags &= ~IOMAP_DIO_INLINE_COMP; need_zeroout = true; } @@ -500,7 +502,8 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, dio->i_size = i_size_read(inode); dio->dops = dops; dio->error = 0; - dio->flags = 0; + /* default to inline completion, turned off when not supported */ + dio->flags = IOMAP_DIO_INLINE_COMP; dio->done_before = done_before; dio->submit.iter = iter; @@ -535,6 +538,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, /* for data sync or sync, we need sync completion processing */ if (iocb_is_dsync(iocb)) { dio->flags |= IOMAP_DIO_NEED_SYNC; + dio->flags &= ~IOMAP_DIO_INLINE_COMP; /* * For datasync only writes, we optimistically try From patchwork Tue Jul 18 19:49:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13317667 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEFE5EB64DC for ; Tue, 18 Jul 2023 19:49:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230332AbjGRTta (ORCPT ); Tue, 18 Jul 2023 15:49:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34856 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230360AbjGRTt3 (ORCPT ); Tue, 18 Jul 2023 15:49:29 -0400 Received: from mail-il1-x12c.google.com (mail-il1-x12c.google.com [IPv6:2607:f8b0:4864:20::12c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7270B9D for ; Tue, 18 Jul 2023 12:49:28 -0700 (PDT) Received: by mail-il1-x12c.google.com with SMTP id e9e14a558f8ab-3461b58c61dso4581405ab.1 for ; Tue, 18 Jul 2023 12:49:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689709768; x=1692301768; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8G1ATQFuI41fkL8nxWaXO/6T7p7QjpNqJTfODI4BURk=; b=XP/l3yrzhGQi73ZPF79j0ePKBJhrhE42B0Y3XeOBI4F9baq5gPci1E+ezJwv+N1YCf AIMDMtvCxkrIeB8dGXVXfDBVkKGzOc21yJcFagzhJs/fZ/Krh5lfLCOHPM6cqc6EV0jA Envh5K0DSur4yfM3I7nQKLq9xo+9FRDOeT+2yzkzO3BKXfODJcidg96iIv5pM6HOHzOF GrbkNxOB4paEieVgEGDRl69LUTZEHORwxi7KUs4SMRziBh5bVJXKFoJQmwcR9fEkEETJ RLbLmNZ8b+lW4gRnfF9HktSAseuMjxKe0Q+uc81umJCeJ4xUuFxHoDmK+HyaL8vBdYas K+Cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689709768; x=1692301768; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8G1ATQFuI41fkL8nxWaXO/6T7p7QjpNqJTfODI4BURk=; b=AgIpbj1IQW3OhtrDxiAkUcsyjOy81MWSsuqtlNYFRu10Eb6Yv1ev4YF7I8FipIxHIF Gk5ORlItkty6PH5rtiAZ+6rahd2JyQ2UArf9OxPrT+QJMtSgG6AC+cederTsB1ZxoQB/ HMSBrUP3ivUVWrCUM17UGGW+lpWK8rNWvh+XT1FAL2Fa/0dPzq2EKfFmy1BNsaN6Q+JP lUtuWqHzvvJkaturnY6br7MlMQ70Nt3YxxEhp+8mexxJlRVRRmipuq0n60VTGGnWX4Rg wwEU6Gt3kwLiIJ8muI0p6MGLeaSHKaAmVGfHwbtBvBEFE7eE74xYrdrT2V1CgQjQ7yM8 CHuQ== X-Gm-Message-State: ABy/qLZnLjfS7g3lTdRkZSEhcC2hBpdtOwqnup3+wT4ojZPmY8OLVNE1 BPRPCfXkghrXBIDgPFk77tj/aw== X-Google-Smtp-Source: APBJJlEP8wRzVlRKVqsnwy25QHqCmAoOA/YsapXt7iXLj1ZNWywK7ZLKdqSZys/f53PLIAPlKLjc2w== X-Received: by 2002:a92:8e4f:0:b0:348:8418:8158 with SMTP id k15-20020a928e4f000000b0034884188158mr2400649ilh.1.1689709767836; Tue, 18 Jul 2023 12:49:27 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id v18-20020a92d252000000b00345e3a04f2dsm897463ilg.62.2023.07.18.12.49.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Jul 2023 12:49:26 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 2/5] fs: add IOCB flags related to passing back dio completions Date: Tue, 18 Jul 2023 13:49:17 -0600 Message-Id: <20230718194920.1472184-4-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230718194920.1472184-1-axboe@kernel.dk> References: <20230718194920.1472184-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Async dio completions generally happen from hard/soft IRQ context, which means that users like iomap may need to defer some of the completion handling to a workqueue. This is less efficient than having the original issuer handle it, like we do for sync IO, and it adds latency to the completions. Add IOCB_DIO_DEFER, which the issuer can set if it is able to safely punt these completions to a safe context. If the dio handler is aware of this flag, assign a callback handler in kiocb->dio_complete and associated data io kiocb->private. The issuer will then call this handler with that data from task context. No functional changes in this patch. Signed-off-by: Jens Axboe --- include/linux/fs.h | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 6867512907d6..115382f66d79 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -338,6 +338,16 @@ enum rw_hint { #define IOCB_NOIO (1 << 20) /* can use bio alloc cache */ #define IOCB_ALLOC_CACHE (1 << 21) +/* + * IOCB_DIO_DEFER can be set by the iocb owner, to indicate that the + * iocb completion can be passed back to the owner for execution from a safe + * context rather than needing to be punted through a workqueue. If this + * flag is set, the completion handling may set iocb->dio_complete to a + * handler, which the issuer will then call from task context to complete + * the processing of the iocb. iocb->private should then also be set to + * the argument being passed to this handler. + */ +#define IOCB_DIO_DEFER (1 << 22) /* for use in trace events */ #define TRACE_IOCB_STRINGS \ @@ -351,7 +361,8 @@ enum rw_hint { { IOCB_WRITE, "WRITE" }, \ { IOCB_WAITQ, "WAITQ" }, \ { IOCB_NOIO, "NOIO" }, \ - { IOCB_ALLOC_CACHE, "ALLOC_CACHE" } + { IOCB_ALLOC_CACHE, "ALLOC_CACHE" }, \ + { IOCB_DIO_DEFER, "DIO_DEFER" } struct kiocb { struct file *ki_filp; @@ -360,7 +371,22 @@ struct kiocb { void *private; int ki_flags; u16 ki_ioprio; /* See linux/ioprio.h */ - struct wait_page_queue *ki_waitq; /* for async buffered IO */ + union { + /* + * Only used for async buffered reads, where it denotes the + * page waitqueue associated with completing the read. Valid + * IFF IOCB_WAITQ is set. + */ + struct wait_page_queue *ki_waitq; + /* + * Can be used for O_DIRECT IO, where the completion handling + * is punted back to the issuer of the IO. May only be set + * if IOCB_DIO_DEFER is set by the issuer, and the issuer must + * then check for presence of this handler when ki_complete is + * invoked. + */ + ssize_t (*dio_complete)(void *data); + }; }; static inline bool is_sync_kiocb(struct kiocb *kiocb) From patchwork Tue Jul 18 19:49:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13317668 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6FE46C001DE for ; Tue, 18 Jul 2023 19:49:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230398AbjGRTtd (ORCPT ); Tue, 18 Jul 2023 15:49:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34874 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230360AbjGRTta (ORCPT ); Tue, 18 Jul 2023 15:49:30 -0400 Received: from mail-io1-xd31.google.com (mail-io1-xd31.google.com [IPv6:2607:f8b0:4864:20::d31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA5649D for ; Tue, 18 Jul 2023 12:49:29 -0700 (PDT) Received: by mail-io1-xd31.google.com with SMTP id ca18e2360f4ac-785d3a53ed6so68142339f.1 for ; Tue, 18 Jul 2023 12:49:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689709769; x=1690314569; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Rd5q4QGxS59bSAw+JDmQ90OVbkUJjAQIgvePrWiPffo=; b=pa9ml1Yq+mAUfVkTfAFqTdCTO//R/u8eINJuCUIgrPVylkBY2W2xIpTcVPIW4wAQZf wqQFKiXHH8mP6ilgsivhhV1sWq122AQhtfGjO8MW+NSM5Dwd9p1fBg9a7rdPecEAkfHa QICn2OplSCK/D8EaM75WZI9NYgCaaToa4yyUa0Qh9GUAIKeFd+tT4cIDiab8bk9WwbQH AtrnWPUMu6V+8nu6IzcUgBZjG5hnFdcK/OLUc50hem5+u1mAli/VHa1JDv6OWho/WzqS kuzawnY63NYtXJ7GAJLijCJNUjLdnc8gvaAr3bcKW0SuVfXCBPS0Hj2i2qUPgGJloePR l4/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689709769; x=1690314569; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Rd5q4QGxS59bSAw+JDmQ90OVbkUJjAQIgvePrWiPffo=; b=dsy/1T7QnmYYsbqCO0fIsvPq7SdKqClZSZzjyPBJ9vlWrNzd5HNu00RjzNyM8jCxEM m7iBGUGqV+uCjZp3pZefTq6qdwbo/lanWp8gx2+qcEUui1OTg4DhTE9lNcpMLxvQeYqo lNIzILaAEy8CXnHWMvcRDYDY7/B77FF4hZnrqsl2of/yLOgyfGMkaH75X8B3dBLC1Wno lVC3H9HBvkJO1V557lzHazAqfEE65gnu/5sUL6FRol4N7kth9UCk5IvKkZp2/7PaAJaY 18GSO5DOMspDAbDTsHsOxwRA9Soozm3s+vctzv+6PlrVfHuzgLlghRZvQ6emq2UAZSio WnrA== X-Gm-Message-State: ABy/qLZeb+fvfrSk9Fifexm3mt9DEU91OyeZtgCpxxnSySA6jP0bd5kd ljUFJivyrjx99CFBOrDuRWM6bg== X-Google-Smtp-Source: APBJJlHlgVHXiTYXQZrmik3S+TX69txqAaXEnmAvcKDroC538cvy7f4ZwipVFMwfo9UCpMhMF1ZU3w== X-Received: by 2002:a05:6e02:3486:b0:33b:d741:5888 with SMTP id bp6-20020a056e02348600b0033bd7415888mr2794446ilb.0.1689709769146; Tue, 18 Jul 2023 12:49:29 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id v18-20020a92d252000000b00345e3a04f2dsm897463ilg.62.2023.07.18.12.49.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Jul 2023 12:49:28 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 3/5] io_uring/rw: add write support for IOCB_DIO_DEFER Date: Tue, 18 Jul 2023 13:49:18 -0600 Message-Id: <20230718194920.1472184-5-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230718194920.1472184-1-axboe@kernel.dk> References: <20230718194920.1472184-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org If the filesystem dio handler understands IOCB_DIO_DEFER, we'll get a kiocb->ki_complete() callback with kiocb->dio_complete set. In that case, rather than complete the IO directly through task_work, queue up an intermediate task_work handler that first processes this callback and then immediately completes the request. For XFS, this avoids a punt through a workqueue, which is a lot less efficient and adds latency to lower queue depth (or sync) O_DIRECT writes. Signed-off-by: Jens Axboe --- io_uring/rw.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/io_uring/rw.c b/io_uring/rw.c index 1bce2208b65c..4ed378c70249 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -285,6 +285,14 @@ static inline int io_fixup_rw_res(struct io_kiocb *req, long res) void io_req_rw_complete(struct io_kiocb *req, struct io_tw_state *ts) { + struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); + + if (rw->kiocb.dio_complete) { + long res = rw->kiocb.dio_complete(rw->kiocb.private); + + io_req_set_res(req, io_fixup_rw_res(req, res), 0); + } + io_req_io_end(req); if (req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)) { @@ -300,9 +308,11 @@ static void io_complete_rw(struct kiocb *kiocb, long res) struct io_rw *rw = container_of(kiocb, struct io_rw, kiocb); struct io_kiocb *req = cmd_to_io_kiocb(rw); - if (__io_complete_rw_common(req, res)) - return; - io_req_set_res(req, io_fixup_rw_res(req, res), 0); + if (!rw->kiocb.dio_complete) { + if (__io_complete_rw_common(req, res)) + return; + io_req_set_res(req, io_fixup_rw_res(req, res), 0); + } req->io_task_work.func = io_req_rw_complete; __io_req_task_work_add(req, IOU_F_TWQ_LAZY_WAKE); } @@ -914,7 +924,13 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags) __sb_writers_release(file_inode(req->file)->i_sb, SB_FREEZE_WRITE); } - kiocb->ki_flags |= IOCB_WRITE; + + /* + * Set IOCB_DIO_DEFER, stating that our handler groks deferring the + * completion to task context. + */ + kiocb->ki_flags |= IOCB_WRITE | IOCB_DIO_DEFER; + kiocb->dio_complete = NULL; if (likely(req->file->f_op->write_iter)) ret2 = call_write_iter(req->file, kiocb, &s->iter); From patchwork Tue Jul 18 19:49:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13317669 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1164C04A6A for ; Tue, 18 Jul 2023 19:49:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230306AbjGRTtd (ORCPT ); Tue, 18 Jul 2023 15:49:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230390AbjGRTtc (ORCPT ); Tue, 18 Jul 2023 15:49:32 -0400 Received: from mail-io1-xd29.google.com (mail-io1-xd29.google.com [IPv6:2607:f8b0:4864:20::d29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D97961996 for ; Tue, 18 Jul 2023 12:49:30 -0700 (PDT) Received: by mail-io1-xd29.google.com with SMTP id ca18e2360f4ac-760dff4b701so56223939f.0 for ; Tue, 18 Jul 2023 12:49:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689709770; x=1692301770; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=J26GveNzN3vnRUoz11F2mi8E7fquCj+ZYxoQryvAmE8=; b=xphtAXKMDXOJ9QPgbjRMoJ+8ODPEqI+DnKqdVe8DCR3COUUW8MjAAwVvGxjLWeCoa2 pMkpL0qvCf5xvTBsGH7ViXZH8KXqenPdhjWrFoFW+yK9n0O7LHaU+9LqJVPh/K2+fCTe VlRE/2kWeqAeOiMViILQ0yLnhQHSrH/XQ7GcEbZywSnJHXnJHgmkBvf2j9IExlz0hvNO iGY2Phm95CLvi1cvPOMS9WyUrzP7j33UqfKeG+Rv3XG08x3jES/37z72+FPSAV06Tlev kemJ9x43fWNSJkszHF2PsyIuqg2bM1uy6GbrTUEfxBdFXEYxq0HPjr//l6EtsO4tX/MM ECUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689709770; x=1692301770; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=J26GveNzN3vnRUoz11F2mi8E7fquCj+ZYxoQryvAmE8=; b=jc9xz9LzId+h38jdSkrvQxcUuuVMKYiNya9zk3Lwxrc5q1rwyxg0il98Q4ztPogJvd X3NLMcRb0bADpp/S65ExqKbSvJMHwh1b5gYUFvvNNmpQhvvbfLxNM6sannKzdtB9+uxY W3DNUyZtewrIV7NSsdvpNQO2KNWUBrulBOxpjjMgVwlUpq43NI6N8Pm3yX0Ze1IucIY1 aWSn3NPdVN/IQ25N5ozSypwAj0vX3qAjsMFLMdJ65CmvxmCUArsPmpsR+Faa8lt5soLQ wI1dJAxVZtcN934M/b0kw1iy06wALc59BQsd01JvRAgQMzKUnS10Kh1o8jEDOEqPho9r AWkw== X-Gm-Message-State: ABy/qLbHNVBnoD4CGl6NAxOaLi7bnNZS4norMZr7zYWtvaD5SruY9lDA M3k1uRgscEup+S+bnVBtLCqTEA== X-Google-Smtp-Source: APBJJlH/cHKNq5jol565/7foeVIkNajJ8mIGx1SHa+z5r1S2bOcQL8NEsSxra8I5OMxlP9onhHXTPw== X-Received: by 2002:a05:6e02:220f:b0:345:db9a:be2c with SMTP id j15-20020a056e02220f00b00345db9abe2cmr2320311ilf.1.1689709770277; Tue, 18 Jul 2023 12:49:30 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id v18-20020a92d252000000b00345e3a04f2dsm897463ilg.62.2023.07.18.12.49.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Jul 2023 12:49:29 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 4/5] iomap: add local 'iocb' variable in iomap_dio_bio_end_io() Date: Tue, 18 Jul 2023 13:49:19 -0600 Message-Id: <20230718194920.1472184-6-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230718194920.1472184-1-axboe@kernel.dk> References: <20230718194920.1472184-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org We use this multiple times, add a local variable for the kiocb. Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 6fa77094cf0a..92b9b9db8b67 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -158,6 +158,8 @@ void iomap_dio_bio_end_io(struct bio *bio) iomap_dio_set_error(dio, blk_status_to_errno(bio->bi_status)); if (atomic_dec_and_test(&dio->ref)) { + struct kiocb *iocb = dio->iocb; + if (dio->wait_for_completion) { struct task_struct *waiter = dio->submit.waiter; WRITE_ONCE(dio->submit.waiter, NULL); @@ -166,9 +168,9 @@ void iomap_dio_bio_end_io(struct bio *bio) WRITE_ONCE(dio->iocb->private, NULL); iomap_dio_complete_work(&dio->aio.work); } else { - struct inode *inode = file_inode(dio->iocb->ki_filp); + struct inode *inode = file_inode(iocb->ki_filp); - WRITE_ONCE(dio->iocb->private, NULL); + WRITE_ONCE(iocb->private, NULL); INIT_WORK(&dio->aio.work, iomap_dio_complete_work); queue_work(inode->i_sb->s_dio_done_wq, &dio->aio.work); } From patchwork Tue Jul 18 19:49:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13317670 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CB7D1C001E0 for ; Tue, 18 Jul 2023 19:49:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230386AbjGRTte (ORCPT ); Tue, 18 Jul 2023 15:49:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34924 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230366AbjGRTtc (ORCPT ); Tue, 18 Jul 2023 15:49:32 -0400 Received: from mail-il1-x12e.google.com (mail-il1-x12e.google.com [IPv6:2607:f8b0:4864:20::12e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 113AA1992 for ; Tue, 18 Jul 2023 12:49:32 -0700 (PDT) Received: by mail-il1-x12e.google.com with SMTP id e9e14a558f8ab-3461b58c61dso4581495ab.1 for ; Tue, 18 Jul 2023 12:49:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689709771; x=1692301771; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+YBW1O6wOlnw08waq30NjAYKZFIoa5A4vGc80qqVFGc=; b=GSdt6Ah3rHE8YNoauY7LLPUL8FiJawaIiq+SLiH4Wkn8Gxd0GLBAYeylxBPXTqCTea kHN5lKexjSMXrM/4ulTNiN2xmotCRgQnqvGv9sCHYdHqnXNU7f1NorFPas7HYrQFQQWF G3u0twbEdsCYSRHWXs8nCrasdpsLXD3YAK/O0i+Iia0J51TXmr3bLMnfkefPITuc6zm2 5E+e6M92SQ2uVAObgn5sDsTCqfsRu3jjpR2kGWqOa8NG3ZaRqY6fysR8p5Hv99+A0+fK ZyWmoaUE6m2yqWywkRHz49+i0F9SXUo7ZLCY4/t5KCzhIuF6FaJK8gijQ8aORAvOj298 BWjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689709771; x=1692301771; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+YBW1O6wOlnw08waq30NjAYKZFIoa5A4vGc80qqVFGc=; b=cwX0u7JRWf9hfQb/Q44vMJmj8g/5v8hG/n0CJyyTMVyLSKe8/fY24nHzcdYcJ69rBE Hbul5i4ocU/5pc5ylVnEJghPXHrmgSku1Ks2MOzn5XTPzaW+mdATPbCnCEUchDgEe6D7 JVTXfV2fABlqguFcLdQiUzPW7NNeY2DY+fK33zygM1L3NAQCcdYvk9GMwj5UlFVc7oy/ dJ/44YI+VW4beIE1/sTZFQsDa5G20d//+Y6wKZYRhXT62hlTdsrkhWWzh2+V23M5hu2B csM9K2aYhqkQBDVRFxHilLdBXMcvqy/dbbg2MgH3pgsAJiTLloGb6TmulNGSV/nB2abV W+5A== X-Gm-Message-State: ABy/qLY6m1gCqUyj+AMEXdl6GXfvvW38HR5FZeRBxNBBe9d/C87X+Cip /Whqnqx1LGhpU1uVuPNdZN9vog== X-Google-Smtp-Source: APBJJlH+8HQ3QJoBxiniGs8gIcDARbfvEgB+1gNWxaOUejcULV5e/+COOkPQ+6HqfD/KfitJVVg/Eg== X-Received: by 2002:a92:7011:0:b0:346:3173:2374 with SMTP id l17-20020a927011000000b0034631732374mr2415279ilc.0.1689709771409; Tue, 18 Jul 2023 12:49:31 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id v18-20020a92d252000000b00345e3a04f2dsm897463ilg.62.2023.07.18.12.49.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Jul 2023 12:49:30 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 5/5] iomap: support IOCB_DIO_DEFER Date: Tue, 18 Jul 2023 13:49:20 -0600 Message-Id: <20230718194920.1472184-7-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230718194920.1472184-1-axboe@kernel.dk> References: <20230718194920.1472184-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org If IOCB_DIO_DEFER is set, utilize that to set kiocb->dio_complete handler and data for that callback. Rather than punt the completion to a workqueue, we pass back the handler and data to the issuer and will get a callback from a safe task context. Using the following fio job to randomly dio write 4k blocks at queue depths of 1..16: fio --name=dio-write --filename=/data1/file --time_based=1 \ --runtime=10 --bs=4096 --rw=randwrite --norandommap --buffered=0 \ --cpus_allowed=4 --ioengine=io_uring --iodepth=16 shows the following results before and after this patch: Stock Patched Diff ======================================= QD1 155K 162K + 4.5% QD2 290K 313K + 7.9% QD4 533K 597K +12.0% QD8 604K 827K +36.9% QD16 615K 845K +37.4% which shows nice wins all around. If we factored in per-IOP efficiency, the wins look even nicer. This becomes apparent as queue depth rises, as the offloaded workqueue completions runs out of steam. Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 92b9b9db8b67..ed615177e1f6 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -131,6 +131,11 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio) } EXPORT_SYMBOL_GPL(iomap_dio_complete); +static ssize_t iomap_dio_deferred_complete(void *data) +{ + return iomap_dio_complete(data); +} + static void iomap_dio_complete_work(struct work_struct *work) { struct iomap_dio *dio = container_of(work, struct iomap_dio, aio.work); @@ -167,6 +172,25 @@ void iomap_dio_bio_end_io(struct bio *bio) } else if ((dio->flags & IOMAP_DIO_INLINE_COMP) && in_task()) { WRITE_ONCE(dio->iocb->private, NULL); iomap_dio_complete_work(&dio->aio.work); + } else if ((dio->flags & IOMAP_DIO_INLINE_COMP) && + (iocb->ki_flags & IOCB_DIO_DEFER)) { + /* only polled IO cares about private cleared */ + iocb->private = dio; + iocb->dio_complete = iomap_dio_deferred_complete; + /* + * Invoke ->ki_complete() directly. We've assigned + * out dio_complete callback handler, and since the + * issuer set IOCB_DIO_DEFER, we know their + * ki_complete handler will notice ->dio_complete + * being set and will defer calling that handler + * until it can be done from a safe task context. + * + * Note that the 'res' being passed in here is + * not important for this case. The actual completion + * value of the request will be gotten from dio_complete + * when that is run by the issuer. + */ + iocb->ki_complete(iocb, 0); } else { struct inode *inode = file_inode(iocb->ki_filp);