From patchwork Thu Jul 20 18:13:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13320954 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12C09EB64DD for ; Thu, 20 Jul 2023 18:13:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229563AbjGTSNT (ORCPT ); Thu, 20 Jul 2023 14:13:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53020 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230056AbjGTSNR (ORCPT ); Thu, 20 Jul 2023 14:13:17 -0400 Received: from mail-il1-x136.google.com (mail-il1-x136.google.com [IPv6:2607:f8b0:4864:20::136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 06F50FC for ; Thu, 20 Jul 2023 11:13:17 -0700 (PDT) Received: by mail-il1-x136.google.com with SMTP id e9e14a558f8ab-345d2b936c2so1189745ab.0 for ; Thu, 20 Jul 2023 11:13:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689876796; x=1690481596; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zNCV/9UyhdagyTy8j9zaAfC1YSEjA868oZWehUYPc18=; b=4qN9kcWKtIHkZTeJ+QkMdqZGhivUbWNO2u1UIl3QwSNJZP2aVwNoHHTott6cJFLKIw 1Ompc0zLAYZrLiHSnqyuVz2PZ44mqJnrMHYKBANXnGxIPW2q4iIeihWTFI619A7Z7n9K Lg7lmY6U3VEtA1gL+kwX0CPjICN2CqbeDkFALxDrtnZT8f1UxgKoVRALFwnqvAeQVOMk QPRi4DwBeRNVbjx3iE3VhffLhWkYrod0flBXkYR6cWoh6vO+ZQgrfCpIzhwe3NsrS0Hn a9SJAd30CTIRal+HdqKoDp8fOjTgJa7BW+DyzHFyEm54rQSCgG3rakK9s5tg36XU8MaK G6aQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689876796; x=1690481596; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zNCV/9UyhdagyTy8j9zaAfC1YSEjA868oZWehUYPc18=; b=UAxzChBKWQqn13ewHcnReovFhaCMKFiJotMLKa2s4N/gvYSu2iL3OCJvll8gku1Ywf AOcpW2K4kGX02oxOl5ekvveB41wzAZmOetr8EO59XHiGxBxXXNPBqgkjBOTNpuUAjnqA xMbJg2+gP/tqilw5Zn/rC3VPU3FosmRB+qKvd/TVxhNpMexIO+E0eVTeTnfk5ZWBDgvb CrOhjcCVeOM2xr4rHHGv2XBPHKyst20O9AR1W0XliBUXtC4ZOvq0N0oiDD1CRAynRm3t 8L7bn+SEXQ0MYO0a0mxyOcqIMjhquTGVLbw/7WjkrAxdwIoE62gFV/bjF6mGOSMQ+Rbm tzfQ== X-Gm-Message-State: ABy/qLYTyKHmYlkDf37SfiaMfy2esjpvZ1SdOKK47y2BwpLM/E4GDA4a o1ynMrHjqkpsrebGrJJh462lch+Sz1fwmINzIdE= X-Google-Smtp-Source: APBJJlHiaFZvj/wo7rFRRLGQCApphYLMdD8Tr5+ePFq5J9BpVWtfJI92yHXy0swHFT+5VVo5jzsazg== X-Received: by 2002:a05:6e02:17c8:b0:346:4eb9:9081 with SMTP id z8-20020a056e0217c800b003464eb99081mr12451688ilu.3.1689876795909; Thu, 20 Jul 2023 11:13:15 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id v6-20020a92c6c6000000b003457e1daba8sm419171ilm.8.2023.07.20.11.13.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Jul 2023 11:13:14 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 1/8] iomap: cleanup up iomap_dio_bio_end_io() Date: Thu, 20 Jul 2023 12:13:03 -0600 Message-Id: <20230720181310.71589-2-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230720181310.71589-1-axboe@kernel.dk> References: <20230720181310.71589-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Make the logic a bit easier to follow: 1) Add a release_bio out path, as everybody needs to touch that, and have our bio ref check jump there if it's non-zero. 2) Add a kiocb local variable. 3) Add comments for each of the three conditions (sync, inline, or async workqueue punt). No functional changes in this patch. Signed-off-by: Jens Axboe Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- fs/iomap/direct-io.c | 46 +++++++++++++++++++++++++++++--------------- 1 file changed, 31 insertions(+), 15 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index ea3b868c8355..0ce60e80c901 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -152,27 +152,43 @@ void iomap_dio_bio_end_io(struct bio *bio) { struct iomap_dio *dio = bio->bi_private; bool should_dirty = (dio->flags & IOMAP_DIO_DIRTY); + struct kiocb *iocb = dio->iocb; if (bio->bi_status) iomap_dio_set_error(dio, blk_status_to_errno(bio->bi_status)); + if (!atomic_dec_and_test(&dio->ref)) + goto release_bio; - if (atomic_dec_and_test(&dio->ref)) { - if (dio->wait_for_completion) { - struct task_struct *waiter = dio->submit.waiter; - WRITE_ONCE(dio->submit.waiter, NULL); - blk_wake_io_task(waiter); - } else if (dio->flags & IOMAP_DIO_WRITE) { - struct inode *inode = file_inode(dio->iocb->ki_filp); - - WRITE_ONCE(dio->iocb->private, NULL); - INIT_WORK(&dio->aio.work, iomap_dio_complete_work); - queue_work(inode->i_sb->s_dio_done_wq, &dio->aio.work); - } else { - WRITE_ONCE(dio->iocb->private, NULL); - iomap_dio_complete_work(&dio->aio.work); - } + /* + * Synchronous dio, task itself will handle any completion work + * that needs after IO. All we need to do is wake the task. + */ + if (dio->wait_for_completion) { + struct task_struct *waiter = dio->submit.waiter; + + WRITE_ONCE(dio->submit.waiter, NULL); + blk_wake_io_task(waiter); + goto release_bio; + } + + /* Read completion can always complete inline. */ + if (!(dio->flags & IOMAP_DIO_WRITE)) { + WRITE_ONCE(iocb->private, NULL); + iomap_dio_complete_work(&dio->aio.work); + goto release_bio; } + /* + * Async DIO completion that requires filesystem level completion work + * gets punted to a work queue to complete as the operation may require + * more IO to be issued to finalise filesystem metadata changes or + * guarantee data integrity. + */ + WRITE_ONCE(iocb->private, NULL); + INIT_WORK(&dio->aio.work, iomap_dio_complete_work); + queue_work(file_inode(iocb->ki_filp)->i_sb->s_dio_done_wq, + &dio->aio.work); +release_bio: if (should_dirty) { bio_check_pages_dirty(bio); } else { From patchwork Thu Jul 20 18:13:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13320953 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F0BFC04A6A for ; Thu, 20 Jul 2023 18:13:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230056AbjGTSNT (ORCPT ); Thu, 20 Jul 2023 14:13:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53032 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229898AbjGTSNT (ORCPT ); Thu, 20 Jul 2023 14:13:19 -0400 Received: from mail-io1-xd2f.google.com (mail-io1-xd2f.google.com [IPv6:2607:f8b0:4864:20::d2f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 53A2E92 for ; Thu, 20 Jul 2023 11:13:18 -0700 (PDT) Received: by mail-io1-xd2f.google.com with SMTP id ca18e2360f4ac-785d3a53ed6so13863139f.1 for ; Thu, 20 Jul 2023 11:13:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689876797; x=1690481597; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=imYhKP5v89lbPeTtC2wFGbz9kkgQjFqYE28HmmHERmw=; b=GrebEaRZHzYbDBDN/hmIpElSBatOqma0fWwnvNuied5zdXHf3S2dMzyUVfN3HdsjOT Qszjbxczbi7MewAEfrn+CLSFDleXg6brNEGAeqS9bXrHhWAloF4dFe0IZHeEFwHb0nNE auYYzqJkR4Gp+3yBQkCeRS+G/eSCS1cTH/1XyHGsVhFqzBFEXw4FzJgcYpAHStkNC5cl griSBiK5VjLUtfMP0/yk1Y47qdiAffkCq4Fp945EVpPQfnVFIP7ShDV3/5UAj72dlrD+ olR1qIegksSBF77fDrVQj/oYhFIeRzHWUg4t4Aob2HfUX7zxABpgY9qgDAeLUk7U88An DS7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689876797; x=1690481597; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=imYhKP5v89lbPeTtC2wFGbz9kkgQjFqYE28HmmHERmw=; b=TSKtltulLMJKR2SS9brcmp/qzW79tI9BLDJJnYW+Otr6CjAnQRpuqzRaRVPIMICbTD Oht3Z5keQ8D++eqQwI7PaMtG3hRmHKFcbe7eB+lwrr4KEQzfKwn16AXvZrqQXfroa9VO anpK+oh7PJg0vb1zEl5fhnIQKcxPBcj/QoUax082QZ8Cy/GquX+sYXzzNpDWGnLeY9z1 xP2Sdey+//bxrKxo6XodTySR2C44X4iTWjPFdB4SWtXWf4T8Ov5+mv/KYumo4qqiJnVw yiksSYDkWt8jvitCiLqDYej0fJBlhSDC28/1pws12n7+AF1th39y4qwG/rfKC9uMAjE7 ddkQ== X-Gm-Message-State: ABy/qLYxiihcIo1CBJ/+3lFQodf+3NRoHRRq57cT/CYKDLmQkw7htPOs mju7eXM0WMvmVCnCYyGoBybQMpDH58TofiKouZo= X-Google-Smtp-Source: APBJJlEZ9N8QzhpudFyRYfXhPH7GjbxneV6i6nTiweh3A/yxGmb9RNjs4YJ7JIBjRusSqCTQ0Et8pQ== X-Received: by 2002:a92:d902:0:b0:345:a3d0:f0d4 with SMTP id s2-20020a92d902000000b00345a3d0f0d4mr3508495iln.3.1689876797148; Thu, 20 Jul 2023 11:13:17 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id v6-20020a92c6c6000000b003457e1daba8sm419171ilm.8.2023.07.20.11.13.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Jul 2023 11:13:16 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 2/8] iomap: add IOMAP_DIO_INLINE_COMP Date: Thu, 20 Jul 2023 12:13:04 -0600 Message-Id: <20230720181310.71589-3-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230720181310.71589-1-axboe@kernel.dk> References: <20230720181310.71589-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Rather than gate whether or not we need to punt a dio completion to a workqueue on whether the IO is a write or not, add an explicit flag for it. For now we treat them the same, reads always set the flags and async writes do not. No functional changes in this patch. Signed-off-by: Jens Axboe Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- fs/iomap/direct-io.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 0ce60e80c901..c654612b24e5 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -20,6 +20,7 @@ * Private flags for iomap_dio, must not overlap with the public ones in * iomap.h: */ +#define IOMAP_DIO_INLINE_COMP (1 << 27) #define IOMAP_DIO_WRITE_FUA (1 << 28) #define IOMAP_DIO_NEED_SYNC (1 << 29) #define IOMAP_DIO_WRITE (1 << 30) @@ -171,8 +172,10 @@ void iomap_dio_bio_end_io(struct bio *bio) goto release_bio; } - /* Read completion can always complete inline. */ - if (!(dio->flags & IOMAP_DIO_WRITE)) { + /* + * Flagged with IOMAP_DIO_INLINE_COMP, we can complete it inline + */ + if (dio->flags & IOMAP_DIO_INLINE_COMP) { WRITE_ONCE(iocb->private, NULL); iomap_dio_complete_work(&dio->aio.work); goto release_bio; @@ -527,6 +530,9 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, iomi.flags |= IOMAP_NOWAIT; if (iov_iter_rw(iter) == READ) { + /* reads can always complete inline */ + dio->flags |= IOMAP_DIO_INLINE_COMP; + if (iomi.pos >= dio->i_size) goto out_free_dio; From patchwork Thu Jul 20 18:13:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13320955 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE519EB64DA for ; Thu, 20 Jul 2023 18:13:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231211AbjGTSNV (ORCPT ); Thu, 20 Jul 2023 14:13:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53048 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229898AbjGTSNU (ORCPT ); Thu, 20 Jul 2023 14:13:20 -0400 Received: from mail-il1-x12c.google.com (mail-il1-x12c.google.com [IPv6:2607:f8b0:4864:20::12c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C051992 for ; Thu, 20 Jul 2023 11:13:19 -0700 (PDT) Received: by mail-il1-x12c.google.com with SMTP id e9e14a558f8ab-34637e55d9dso1179095ab.1 for ; Thu, 20 Jul 2023 11:13:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689876798; x=1690481598; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lO/y0JczduNfKF+qr7m+WoYz7I0WlwYaSUZRVfc+zV4=; b=Pu/zmaJZ90qixup8NFEiStAk1UF0+44ZzF3j0OatYBtmtF3Gy4E0O+Dj2c2tEZkiem oNjk/U4ulEDcKKOVfsRpe7QvqLeQp2s/gMbZCUBK2yR2DFYnrtOTFxH4rZipRCRyPoii uY8TVDqC7Oix7O441rOzuJUU+WfZ+j/l4oI3msU2Dq2PMU5uqJfX2tKtdQ6FUPOADp6I QQESLDkScyP48b+XwLJgO1fz04/S6J/fzGXaygiN17dnm4zsYbdtjvTee9nosFmcV0QV WJB6iyJC2QRnSUvFl9ZjzceIVXB+oO4ysU/LGDKKOs8E8K1lePgmvNrMLzmIWGgkcqnh c0rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689876798; x=1690481598; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lO/y0JczduNfKF+qr7m+WoYz7I0WlwYaSUZRVfc+zV4=; b=Jt8ZYpliwh8rYYqGyXWkAbLTN+1CASDTI2snPGw36fpUTPvNj84ZkGHHH6jG5NAnIo QdMlH97oZtbXwsxMYELbAHSjeNZBZLl6E8KaYQwPU0oKUXpcNj88u+oyTsonZnS7dhe1 oQ73fRhajHKpCml4ZRFaY0oGXZnBhwBgfSlllD3Rc1n2kqUgIRpTXu2KBzyTkbMok8Ky /2rsNb8Yoo079mjJw/I+O1fX3GFNwSFuWNsi4AJBP/LZeXmeYpb6WT3rcml+QjeORJlE TLAtwvESvIzmFDRo4jydB44x9+4HkJk1DM5+i+TuWvetppaWCl+m/cNdtFWCCH3HzpQH vcWQ== X-Gm-Message-State: ABy/qLbcKtGeFspFTy41+cYjE5pKCeuksZCn27l7X6wU3XGO/1ZeGR/3 +PgV7rA6k9WH1gBNgFTOQHQcU3hCXw+f+mlKgAc= X-Google-Smtp-Source: APBJJlE7fUZUWzIAkTh5EVdAECs4Ej9tOOcP9r7khn+zVkqb+CMEgaCM9pBkodR5d/h/5QA0K7QCcg== X-Received: by 2002:a05:6e02:17c8:b0:346:4eb9:9081 with SMTP id z8-20020a056e0217c800b003464eb99081mr12451755ilu.3.1689876798649; Thu, 20 Jul 2023 11:13:18 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id v6-20020a92c6c6000000b003457e1daba8sm419171ilm.8.2023.07.20.11.13.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Jul 2023 11:13:17 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 3/8] iomap: treat a write through cache the same as FUA Date: Thu, 20 Jul 2023 12:13:05 -0600 Message-Id: <20230720181310.71589-4-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230720181310.71589-1-axboe@kernel.dk> References: <20230720181310.71589-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Whether we have a write back cache and are using FUA or don't have a write back cache at all is the same situation. Treat them the same. This makes the IOMAP_DIO_WRITE_FUA name a bit misleading, as we have two cases that provide stable writes: 1) Volatile write cache with FUA writes 2) Normal write without a volatile write cache Rename that flag to IOMAP_DIO_STABLE_WRITE to make that clearer, and update some of the FUA comments as well. Signed-off-by: Jens Axboe Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- fs/iomap/direct-io.c | 29 +++++++++++++++++------------ 1 file changed, 17 insertions(+), 12 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index c654612b24e5..9f97d0d03724 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -21,7 +21,7 @@ * iomap.h: */ #define IOMAP_DIO_INLINE_COMP (1 << 27) -#define IOMAP_DIO_WRITE_FUA (1 << 28) +#define IOMAP_DIO_STABLE_WRITE (1 << 28) #define IOMAP_DIO_NEED_SYNC (1 << 29) #define IOMAP_DIO_WRITE (1 << 30) #define IOMAP_DIO_DIRTY (1 << 31) @@ -222,7 +222,7 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, /* * Figure out the bio's operation flags from the dio request, the * mapping, and whether or not we want FUA. Note that we can end up - * clearing the WRITE_FUA flag in the dio request. + * clearing the STABLE_WRITE flag in the dio request. */ static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio, const struct iomap *iomap, bool use_fua) @@ -236,7 +236,7 @@ static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio, if (use_fua) opflags |= REQ_FUA; else - dio->flags &= ~IOMAP_DIO_WRITE_FUA; + dio->flags &= ~IOMAP_DIO_STABLE_WRITE; return opflags; } @@ -276,11 +276,13 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, * Use a FUA write if we need datasync semantics, this is a pure * data IO that doesn't require any metadata updates (including * after IO completion such as unwritten extent conversion) and - * the underlying device supports FUA. This allows us to avoid - * cache flushes on IO completion. + * the underlying device either supports FUA or doesn't have + * a volatile write cache. This allows us to avoid cache flushes + * on IO completion. */ if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) && - (dio->flags & IOMAP_DIO_WRITE_FUA) && bdev_fua(iomap->bdev)) + (dio->flags & IOMAP_DIO_STABLE_WRITE) && + (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev))) use_fua = true; } @@ -560,12 +562,15 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, /* * For datasync only writes, we optimistically try - * using FUA for this IO. Any non-FUA write that - * occurs will clear this flag, hence we know before - * completion whether a cache flush is necessary. + * using STABLE_WRITE for this IO. Stable writes are + * either FUA with a write cache, or a normal write to + * a device without a volatile write cache. For the + * former, Any non-FUA write that occurs will clear this + * flag, hence we know before completion whether a cache + * flush is necessary. */ if (!(iocb->ki_flags & IOCB_SYNC)) - dio->flags |= IOMAP_DIO_WRITE_FUA; + dio->flags |= IOMAP_DIO_STABLE_WRITE; } /* @@ -627,10 +632,10 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, iomap_dio_set_error(dio, ret); /* - * If all the writes we issued were FUA, we don't need to flush the + * If all the writes we issued were stable, we don't need to flush the * cache on IO completion. Clear the sync flag for this case. */ - if (dio->flags & IOMAP_DIO_WRITE_FUA) + if (dio->flags & IOMAP_DIO_STABLE_WRITE) dio->flags &= ~IOMAP_DIO_NEED_SYNC; WRITE_ONCE(iocb->private, dio->submit.poll_bio); From patchwork Thu Jul 20 18:13:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13320956 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D14D9C04FDF for ; Thu, 20 Jul 2023 18:13:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231200AbjGTSNW (ORCPT ); Thu, 20 Jul 2023 14:13:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53068 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229898AbjGTSNW (ORCPT ); Thu, 20 Jul 2023 14:13:22 -0400 Received: from mail-il1-x12a.google.com (mail-il1-x12a.google.com [IPv6:2607:f8b0:4864:20::12a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 392EE270B for ; Thu, 20 Jul 2023 11:13:21 -0700 (PDT) Received: by mail-il1-x12a.google.com with SMTP id e9e14a558f8ab-345d2b936c2so1189885ab.0 for ; Thu, 20 Jul 2023 11:13:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689876800; x=1690481600; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Evtw5/CKfgOAfc/VQ4aHsJxNdKm95vnozn2z+EJmynA=; b=2FZWTI6VE/8V0avffLim6HIwlxD9Fk2MtUln2XBht9nr7LlA02/G+cdms+E2P1SqX/ HuvnxtiX+9vWPOpcH4gnsA+7c3lSZM2wtY24DPAKQCEEcEeWOhT3WiI/2xVQ8HIFN31W 1614oB0Lg3n2wiX+b4OV2JZCbBVrUaXcpPVXl6MnWEIHtETEwB7RiMq/wPu+XA6kbzBR YgPF0aQppdrVuLkXBF9mc1wJU86C5ASd9XkupSuUroSPNP3OulB86OrEOK0zvXglZEHJ t1F3hQTNxBee51uRpPwUDh0YrAIM/9xN7NF1MEOKdXD3GCYNhqa54dLWUgHZjJYE+ErR /z8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689876800; x=1690481600; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Evtw5/CKfgOAfc/VQ4aHsJxNdKm95vnozn2z+EJmynA=; b=JRv9qcuygcZVm2CgkGvUlujzFIID0fONG1ZCNL/12VXgSF38XVmq2ABG01k4vot4mL dgUyE7tEXbmogYCh1j6b0a/2tf12YGnyqu71ZialfHu0wpBt2mg1Rww+x2HFcDbVLPEi 452Oq7WpfytXnvey9Be2jvGZLMUD+O+bqtAazpsUwhxBjt8Uf0znEzJDIIsx/juwBLMo 0aUobcr8aGAORf3zQHBG4Wr3x/Rqgaz3dXr36/nNwLaQAPYPSVKvc9skbwErg6xZYpcC XmpIA4i1whamYLu9UqpTrucHXgA0U/Mmsv+f5doKHRpsNxUd//xnvSJapVsQZVZ4/Ejc FiRg== X-Gm-Message-State: ABy/qLYPVXkpo018vrKUyRgC879uTUFgPA5wBejBCGkQp1WzlJ1DgTXq +cEqJgXs4t7PSeqARvFOhOabxNHxavTImfFXEJQ= X-Google-Smtp-Source: APBJJlFssnsUezB0DYxfeknIGY4IWa5Hun4JEj8st3lJH1IWgoUsLVFo7skmJAH77DlOnsrYnJF1TA== X-Received: by 2002:a92:c243:0:b0:346:1919:7cb1 with SMTP id k3-20020a92c243000000b0034619197cb1mr12076111ilo.2.1689876800080; Thu, 20 Jul 2023 11:13:20 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id v6-20020a92c6c6000000b003457e1daba8sm419171ilm.8.2023.07.20.11.13.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Jul 2023 11:13:19 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 4/8] iomap: completed polled IO inline Date: Thu, 20 Jul 2023 12:13:06 -0600 Message-Id: <20230720181310.71589-5-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230720181310.71589-1-axboe@kernel.dk> References: <20230720181310.71589-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Polled IO is only allowed for conditions where task completion is safe anyway, so we can always complete it inline. This cannot easily be checked with a submission side flag, as the block layer may clear the polled flag and turn it into a regular IO instead. Hence we need to check this at completion time. If REQ_POLLED is still set, then we know that this IO was successfully polled, and is completing in task context. Signed-off-by: Jens Axboe Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- fs/iomap/direct-io.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 9f97d0d03724..c3ea1839628f 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -173,9 +173,19 @@ void iomap_dio_bio_end_io(struct bio *bio) } /* - * Flagged with IOMAP_DIO_INLINE_COMP, we can complete it inline + * Flagged with IOMAP_DIO_INLINE_COMP, we can complete it inline. + * Ditto for polled requests - if the flag is still at completion + * time, then we know the request was actually polled and completion + * is called from the task itself. This is why we need to check it + * here rather than flag it at issue time. */ - if (dio->flags & IOMAP_DIO_INLINE_COMP) { + if ((dio->flags & IOMAP_DIO_INLINE_COMP) || (bio->bi_opf & REQ_POLLED)) { + /* + * For polled IO, we need to clear ->private as it points to + * the bio being polled for. The completion side uses it to + * know if a given request has been found yet or not. For + * non-polled IO, ->private isn't applicable. + */ WRITE_ONCE(iocb->private, NULL); iomap_dio_complete_work(&dio->aio.work); goto release_bio; From patchwork Thu Jul 20 18:13:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13320957 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33371C001DE for ; Thu, 20 Jul 2023 18:13:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231276AbjGTSNY (ORCPT ); Thu, 20 Jul 2023 14:13:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53084 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229898AbjGTSNX (ORCPT ); Thu, 20 Jul 2023 14:13:23 -0400 Received: from mail-il1-x135.google.com (mail-il1-x135.google.com [IPv6:2607:f8b0:4864:20::135]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 43291E6F for ; Thu, 20 Jul 2023 11:13:22 -0700 (PDT) Received: by mail-il1-x135.google.com with SMTP id e9e14a558f8ab-3461b58c61dso1579935ab.1 for ; Thu, 20 Jul 2023 11:13:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689876801; x=1690481601; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vkFdmhtuIxCFwZFIvMW05Nk5MCZjzgqa77O2M4w6Fac=; b=LWMqSi3R9agDnnB3ZDBuxXuQdEK0985PbGqifHai6o9g9UVb928jR3cHBDidCjDMaZ BqCGoAF2Y2UoWZKN5NxTzXGaNwbwxC1s7kxSnuRph+r1Lnf3J9QBbhin+PWEvh4khDSt 3oGXC6pKOVRASPA69xnSSf18+TlNIJAPnNnnm8RJiUXD8zN+JW8WrHUvwg5UF+sEaNFW olStaXqnmEiSKkMWQ7CTlocDD/S7VdufYks0u28+Ltv0X/4CA8ld1UYJr4lKFQNG/1cl 64RxwYTVaeeoNIhZU4jyGi33OUrD4UcsS8E3k9rIYUz1N3dPex+pNTPh69X82c7+YGO5 fcIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689876801; x=1690481601; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vkFdmhtuIxCFwZFIvMW05Nk5MCZjzgqa77O2M4w6Fac=; b=Mc+Tsb3eYEut7FuRhwN/KRM09HYi2jJMhkxH5kywM2ezMz5UtYatB9TClhr1Gh61vI GzKXEjXvCf2XLlWpOGh9cJj9hd25DlpnalIqe7MuYJa7gxW0OF1fu1Y6PekAA7JoXTDB zy1a4hbf25KzlsMpftazW7k6B0i8/Cy8cZJ2waXVEpPHD8npRzFJvBgF9H0KAl90X4e1 2VayQpiw2dZas5hPDLSv1l5huERyrS0G4LuiSdEePGBWLuJo3L69ORMLl/pHbIlbpw7m tUApO7wo4gOlgFIlyPa3kvi/W8i/2FW3WgyR3OpGgPoZ/n6XxU11AWNm7qMyodfo+5uj 0sYQ== X-Gm-Message-State: ABy/qLbcjCnyXvmPtlSit5UU1Zz9Z13nuleIfD7Vf7onU2+kcg28wIZ7 ntLXeQ4jrUXllpLWDdOsn6td62gllrjLe4g/5dI= X-Google-Smtp-Source: APBJJlGwgRRJcWW8HC8ms5DLopDsZTENWKAh9pRaFUmDR8PV1ND1TzaaSs4gMEvH4qeTM74lkTmL9g== X-Received: by 2002:a92:d985:0:b0:345:ad39:ff3 with SMTP id r5-20020a92d985000000b00345ad390ff3mr3407704iln.3.1689876801300; Thu, 20 Jul 2023 11:13:21 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id v6-20020a92c6c6000000b003457e1daba8sm419171ilm.8.2023.07.20.11.13.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Jul 2023 11:13:20 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 5/8] iomap: only set iocb->private for polled bio Date: Thu, 20 Jul 2023 12:13:07 -0600 Message-Id: <20230720181310.71589-6-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230720181310.71589-1-axboe@kernel.dk> References: <20230720181310.71589-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org iocb->private is only used for polled IO, where the completer will find the bio to poll through that field. Assign it when we're submitting a polled bio, and get rid of the dio->poll_bio indirection. Signed-off-by: Jens Axboe Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- fs/iomap/direct-io.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index c3ea1839628f..cce9af019705 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -42,7 +42,6 @@ struct iomap_dio { struct { struct iov_iter *iter; struct task_struct *waiter; - struct bio *poll_bio; } submit; /* used for aio completion: */ @@ -64,12 +63,14 @@ static struct bio *iomap_dio_alloc_bio(const struct iomap_iter *iter, static void iomap_dio_submit_bio(const struct iomap_iter *iter, struct iomap_dio *dio, struct bio *bio, loff_t pos) { + struct kiocb *iocb = dio->iocb; + atomic_inc(&dio->ref); /* Sync dio can't be polled reliably */ - if ((dio->iocb->ki_flags & IOCB_HIPRI) && !is_sync_kiocb(dio->iocb)) { - bio_set_polled(bio, dio->iocb); - dio->submit.poll_bio = bio; + if ((iocb->ki_flags & IOCB_HIPRI) && !is_sync_kiocb(iocb)) { + bio_set_polled(bio, iocb); + WRITE_ONCE(iocb->private, bio); } if (dio->dops && dio->dops->submit_io) @@ -197,7 +198,6 @@ void iomap_dio_bio_end_io(struct bio *bio) * more IO to be issued to finalise filesystem metadata changes or * guarantee data integrity. */ - WRITE_ONCE(iocb->private, NULL); INIT_WORK(&dio->aio.work, iomap_dio_complete_work); queue_work(file_inode(iocb->ki_filp)->i_sb->s_dio_done_wq, &dio->aio.work); @@ -536,7 +536,6 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, dio->submit.iter = iter; dio->submit.waiter = current; - dio->submit.poll_bio = NULL; if (iocb->ki_flags & IOCB_NOWAIT) iomi.flags |= IOMAP_NOWAIT; @@ -648,8 +647,6 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, if (dio->flags & IOMAP_DIO_STABLE_WRITE) dio->flags &= ~IOMAP_DIO_NEED_SYNC; - WRITE_ONCE(iocb->private, dio->submit.poll_bio); - /* * We are about to drop our additional submission reference, which * might be the last reference to the dio. There are three different From patchwork Thu Jul 20 18:13:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13320958 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 830FDC00528 for ; Thu, 20 Jul 2023 18:13:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231296AbjGTSNZ (ORCPT ); Thu, 20 Jul 2023 14:13:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53118 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231248AbjGTSNY (ORCPT ); Thu, 20 Jul 2023 14:13:24 -0400 Received: from mail-il1-x12b.google.com (mail-il1-x12b.google.com [IPv6:2607:f8b0:4864:20::12b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6DB45FC for ; Thu, 20 Jul 2023 11:13:23 -0700 (PDT) Received: by mail-il1-x12b.google.com with SMTP id e9e14a558f8ab-345d2b936c2so1189935ab.0 for ; Thu, 20 Jul 2023 11:13:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689876802; x=1690481602; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=HmtTYMUzX/gcmI/rcwUW5tKXEH+MB6XY0myuWWxnltA=; b=TtTnh3nOJsWhUQESiS/FdDhWMNHmejaFgXrPFWlYvd6yTCAia2mamY37n36yDYp3j+ IeV5CSyHNb6G7GRmuPgRn+mj/NElj3ufZYICp8wLgErFN5wEpXv94F4Xv2RqT12/3k6A Ps6WuR84Yki74moAFBoJYdiS7aJFQzWBgAyouYP72gRA1uzaPLhnb9SuSWuSizi6U40F xdXSJRE6o/dlrwmzFWkMB/tk52ZhumqRg3v3g6wGI6YyOkutH5gqEEl3SCcnYvahPcLL cBV2a/V0OzTjOyusEYhuH3iFyZ7osj9/KL5y2+7FXqivBqsbEXS9MxFnUzYiqAVorrs1 LRzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689876802; x=1690481602; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HmtTYMUzX/gcmI/rcwUW5tKXEH+MB6XY0myuWWxnltA=; b=Hlo/sx7FLtIdVZmeO2u8gOL0sJQCfm+cBY1NAZnsb6yJ6Bu5oPMt9B7DfnIz80crCV u2aoScqqGQT735lqTlofEd+t1Z71PwLZKSHzhz3zHSgUScVptq+X5RX8mIwKOhF96N/n VFKqB8LNpfpJfLnGSz3ZLqKDCmXPDyBAm64bvHYTmZ3NsfoUSqacKHCj6JDaqwFWM/vR dljYtnVvu42CmwKLisy+L4uzZeNTCWm1+rn1pqufAc5xnmxEyNPlzZJrjjUBrcTfbD00 rkp2bYWspL5UDuEsUkUi4S6BvnrX1RQqL05Bk3sqtt1NJqT99xubhGYXa2BBqBY0Unc6 eBhA== X-Gm-Message-State: ABy/qLY06H1AceG5bOEh2dqC5U0VENthv9FIraNLPVbM7zObE9iXpmvf Z7ZARbQ2/MXLjb6D2MXFVpva1IhQVcNB4vKTR64= X-Google-Smtp-Source: APBJJlFouanhb4sgzuIro+mo0I/bEw0edbEnho2DR1D4kosEOE48J+ytS48l5Qf9uyCbXX2iGR+/VA== X-Received: by 2002:a05:6e02:17c8:b0:346:4eb9:9081 with SMTP id z8-20020a056e0217c800b003464eb99081mr12451833ilu.3.1689876802463; Thu, 20 Jul 2023 11:13:22 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id v6-20020a92c6c6000000b003457e1daba8sm419171ilm.8.2023.07.20.11.13.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Jul 2023 11:13:21 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 6/8] fs: add IOCB flags related to passing back dio completions Date: Thu, 20 Jul 2023 12:13:08 -0600 Message-Id: <20230720181310.71589-7-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230720181310.71589-1-axboe@kernel.dk> References: <20230720181310.71589-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Async dio completions generally happen from hard/soft IRQ context, which means that users like iomap may need to defer some of the completion handling to a workqueue. This is less efficient than having the original issuer handle it, like we do for sync IO, and it adds latency to the completions. Add IOCB_DIO_DEFER, which the issuer can set if it is able to safely punt these completions to a safe context. If the dio handler is aware of this flag, assign a callback handler in kiocb->dio_complete and associated data io kiocb->private. The issuer will then call this handler with that data from task context. No functional changes in this patch. Signed-off-by: Jens Axboe Reviewed-by: Christoph Hellwig --- include/linux/fs.h | 34 ++++++++++++++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 6867512907d6..2c589418a078 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -338,6 +338,20 @@ enum rw_hint { #define IOCB_NOIO (1 << 20) /* can use bio alloc cache */ #define IOCB_ALLOC_CACHE (1 << 21) +/* + * IOCB_DIO_DEFER can be set by the iocb owner, to indicate that the + * iocb completion can be passed back to the owner for execution from a safe + * context rather than needing to be punted through a workqueue. If this + * flag is set, the completion handling may set iocb->dio_complete to a + * handler, which the issuer will then call from task context to complete + * the processing of the iocb. iocb->private should then also be set to + * the argument being passed to this handler. Note that while this provides + * a task context for the dio_complete() callback, it should only be used + * on the completion side for non-IO generating completions. It's fine to + * call blocking functions from this callback, but they should not wait for + * unrelated IO (like cache flushing, new IO generation, etc). + */ +#define IOCB_DIO_DEFER (1 << 22) /* for use in trace events */ #define TRACE_IOCB_STRINGS \ @@ -351,7 +365,8 @@ enum rw_hint { { IOCB_WRITE, "WRITE" }, \ { IOCB_WAITQ, "WAITQ" }, \ { IOCB_NOIO, "NOIO" }, \ - { IOCB_ALLOC_CACHE, "ALLOC_CACHE" } + { IOCB_ALLOC_CACHE, "ALLOC_CACHE" }, \ + { IOCB_DIO_DEFER, "DIO_DEFER" } struct kiocb { struct file *ki_filp; @@ -360,7 +375,22 @@ struct kiocb { void *private; int ki_flags; u16 ki_ioprio; /* See linux/ioprio.h */ - struct wait_page_queue *ki_waitq; /* for async buffered IO */ + union { + /* + * Only used for async buffered reads, where it denotes the + * page waitqueue associated with completing the read. Valid + * IFF IOCB_WAITQ is set. + */ + struct wait_page_queue *ki_waitq; + /* + * Can be used for O_DIRECT IO, where the completion handling + * is punted back to the issuer of the IO. May only be set + * if IOCB_DIO_DEFER is set by the issuer, and the issuer must + * then check for presence of this handler when ki_complete is + * invoked. + */ + ssize_t (*dio_complete)(void *data); + }; }; static inline bool is_sync_kiocb(struct kiocb *kiocb) From patchwork Thu Jul 20 18:13:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13320959 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19722EB64DA for ; Thu, 20 Jul 2023 18:13:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231249AbjGTSN1 (ORCPT ); Thu, 20 Jul 2023 14:13:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53158 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231248AbjGTSN0 (ORCPT ); Thu, 20 Jul 2023 14:13:26 -0400 Received: from mail-il1-x130.google.com (mail-il1-x130.google.com [IPv6:2607:f8b0:4864:20::130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 016842712 for ; Thu, 20 Jul 2023 11:13:24 -0700 (PDT) Received: by mail-il1-x130.google.com with SMTP id e9e14a558f8ab-34637e55d9dso1179245ab.1 for ; Thu, 20 Jul 2023 11:13:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689876803; x=1690481603; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2Rz0NMHvZQNALkINTruLCMRwh4c7Tji4y7FPdIx8g4g=; b=pC3mUm4RVW6tBqU5UJQyCuZiTop8FqYLfUfiOaIDZ920FVRqxCGgm6MU8r0yEoguGw dywBShB6sVkhWu7D9pHri/wY+jOZ7w+m0tEAymA35MnsiaYYQ+bl6hDigGzUFnKFBuSN wtocR8b6OzHEgeb3Gbs3Qaj5C1yutuCYZUTB4hORCsdq1wdovSOhXHt9U8kmnVPpe5WG kvOhXLYFrlz+oAomG+rQX+RqBZMjDJSVMkpmi4ZaVelzk/t08XsS4yG+BuJi/6oVbGQS CTqPyL67wiFK3FHgDUP2m0umwWyMpW0N54FGolmXK5t+C38uKHjkAfV/7N68q3RvyZBw u1Bg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689876803; x=1690481603; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2Rz0NMHvZQNALkINTruLCMRwh4c7Tji4y7FPdIx8g4g=; b=YOfFbBBGTFTxKjkb7XZZEezYzwZfJtJmR89vZU89p3RRVOR4d/zysuVcTYsKToaIs5 1l53gUr/3WBahApMy932x6PaG8XSIHPqM2cybjp0TljUhiCobzUfDFez80i2O8KIblMs rD6OPG95YLj3tEyfVXfd13+PKF9fYzXCi49eg2kaZCM81cHq8fJJ86mKti87GGgfOEj/ Hvt5mZITeMeyFUGf+dTZg8yl1MjVKBQmuM1+LCKRZUCWa/S86y17Y4BCQS93BHK4LyS0 ofxlCg8LkPd/5P93Pze5ctuMHN/lDKK3oCDQb+pDNc8x6afcq7RNSMQm+k4ufJkHCv1L eE4w== X-Gm-Message-State: ABy/qLbCiY80GkVdsdGRkrJmHJBrJAjMI3SUsJPCiyNQrUVFuC76UMbQ 3WBWhXBep+Khr1tSdjqV0ZINAd5EVqm5+KMkH+g= X-Google-Smtp-Source: APBJJlG3tfXHp5O7/DHLO/fhFJiSIsSJYxb7x8KI1mhJgfwWdkVNGtjtcnQdJxiX2OOGG55ww6XAkw== X-Received: by 2002:a05:6e02:17c8:b0:346:4eb9:9081 with SMTP id z8-20020a056e0217c800b003464eb99081mr12451859ilu.3.1689876803715; Thu, 20 Jul 2023 11:13:23 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id v6-20020a92c6c6000000b003457e1daba8sm419171ilm.8.2023.07.20.11.13.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Jul 2023 11:13:22 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 7/8] io_uring/rw: add write support for IOCB_DIO_DEFER Date: Thu, 20 Jul 2023 12:13:09 -0600 Message-Id: <20230720181310.71589-8-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230720181310.71589-1-axboe@kernel.dk> References: <20230720181310.71589-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org If the filesystem dio handler understands IOCB_DIO_DEFER, we'll get a kiocb->ki_complete() callback with kiocb->dio_complete set. In that case, rather than complete the IO directly through task_work, queue up an intermediate task_work handler that first processes this callback and then immediately completes the request. For XFS, this avoids a punt through a workqueue, which is a lot less efficient and adds latency to lower queue depth (or sync) O_DIRECT writes. Only do this for non-polled IO, as polled IO doesn't need this kind of deferral as it always completes within the task itself. This then avoids a check for deferral in the polled IO completion handler. Signed-off-by: Jens Axboe Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- io_uring/rw.c | 27 ++++++++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/io_uring/rw.c b/io_uring/rw.c index 1bce2208b65c..f4f700383b4e 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -285,6 +285,14 @@ static inline int io_fixup_rw_res(struct io_kiocb *req, long res) void io_req_rw_complete(struct io_kiocb *req, struct io_tw_state *ts) { + struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); + + if (rw->kiocb.dio_complete) { + long res = rw->kiocb.dio_complete(rw->kiocb.private); + + io_req_set_res(req, io_fixup_rw_res(req, res), 0); + } + io_req_io_end(req); if (req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)) { @@ -300,9 +308,11 @@ static void io_complete_rw(struct kiocb *kiocb, long res) struct io_rw *rw = container_of(kiocb, struct io_rw, kiocb); struct io_kiocb *req = cmd_to_io_kiocb(rw); - if (__io_complete_rw_common(req, res)) - return; - io_req_set_res(req, io_fixup_rw_res(req, res), 0); + if (!rw->kiocb.dio_complete) { + if (__io_complete_rw_common(req, res)) + return; + io_req_set_res(req, io_fixup_rw_res(req, res), 0); + } req->io_task_work.func = io_req_rw_complete; __io_req_task_work_add(req, IOU_F_TWQ_LAZY_WAKE); } @@ -916,6 +926,17 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags) } kiocb->ki_flags |= IOCB_WRITE; + /* + * For non-polled IO, set IOCB_DIO_DEFER, stating that our handler + * groks deferring the completion to task context. This isn't + * necessary and useful for polled IO as that can always complete + * directly. + */ + if (!(kiocb->ki_flags & IOCB_HIPRI)) { + kiocb->ki_flags |= IOCB_DIO_DEFER; + kiocb->dio_complete = NULL; + } + if (likely(req->file->f_op->write_iter)) ret2 = call_write_iter(req->file, kiocb, &s->iter); else if (req->file->f_op->write) From patchwork Thu Jul 20 18:13:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13320960 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A47BEC00528 for ; Thu, 20 Jul 2023 18:13:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231314AbjGTSN2 (ORCPT ); Thu, 20 Jul 2023 14:13:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53134 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231301AbjGTSN1 (ORCPT ); Thu, 20 Jul 2023 14:13:27 -0400 Received: from mail-io1-xd2f.google.com (mail-io1-xd2f.google.com [IPv6:2607:f8b0:4864:20::d2f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6364A270B for ; Thu, 20 Jul 2023 11:13:26 -0700 (PDT) Received: by mail-io1-xd2f.google.com with SMTP id ca18e2360f4ac-760dff4b701so13424939f.0 for ; Thu, 20 Jul 2023 11:13:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689876805; x=1690481605; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=uXqZdaZ6Mquv4fIr1Ve7JSk8CcFji/eSZ6SaOz1unh0=; b=gkBbhoaJhW9QE7tFd5BKEysDmJS6oIW0zakPz2TmWExyM5h0e0/zYspxWItXES0B2c AmUMz1xk4bpSXgZVRJDz2L59lYjvt+2CPyXqTha5Tn7xc3FAUTHNRP/9Uu3ELEcHGUek mbbJ7WZ2ryoaRSW1t37jYZfYAYw98N8W2M8lS05npoJjj3QTOnCHIdx5tTwqHlfaVWoG 4bsgA5MEtBPU75KQdCvK1lFVtqrIkRIu6bSeVgnEUAN7Dj8kp4scq7G8nxRL27W67QBL ZXPin3bk6sqG7COAkitQ+WWXoUIZyHPvJal/OYtfGyLm2JLIZ1sYkkrE5mjoQJxLK1FY Lj7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689876805; x=1690481605; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uXqZdaZ6Mquv4fIr1Ve7JSk8CcFji/eSZ6SaOz1unh0=; b=Ozmg9cu18MKdFJqIr4syBYxmCoVEK2+ebMprJFUFxXzguHb6MJHd56ph7xj3yt820z rbyEDS6pKo27On/SG5acHtsAEdW4Ey+y3V6vC5o/8NtWrjNuInnltLAHdX1OMerjKT6z ca3ctACCZ1Te5vcCJkXGJhtGRD36yAI2jlGymOch/qvGsZ7LqT+Wgym2yq0a3qcBxN/0 AqkAITFU1n8F3H3LKTmNZqUlEGckjtCVI5DhSXdtHRxrrCysz6aDWat/PptKo1qPQH3C hZPODGzOXwfJOI355oFxXTaOESxv8Slai22kRP3ujBm1j5x6GALi9UeMkCYoFmuPD/Dj 7REQ== X-Gm-Message-State: ABy/qLZK21wYX46SlW74xwRJPkInBGQo7A3yIGmf63Khk+WB5XqtLEen r+lGJatiVrAeZ2YJo+v6l7kpwYVLs2z+Ny5efW4= X-Google-Smtp-Source: APBJJlH7ibxVZcFGYESD3kYWTxfRRHxo2GQ6mcaCElNBn5D+3k2OZwRxH4WCvaIa00ZBvqqERnpfcw== X-Received: by 2002:a92:dacf:0:b0:345:db9a:be2c with SMTP id o15-20020a92dacf000000b00345db9abe2cmr3262388ilq.1.1689876805117; Thu, 20 Jul 2023 11:13:25 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id v6-20020a92c6c6000000b003457e1daba8sm419171ilm.8.2023.07.20.11.13.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Jul 2023 11:13:24 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 8/8] iomap: support IOCB_DIO_DEFER Date: Thu, 20 Jul 2023 12:13:10 -0600 Message-Id: <20230720181310.71589-9-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230720181310.71589-1-axboe@kernel.dk> References: <20230720181310.71589-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org If IOCB_DIO_DEFER is set, utilize that to set kiocb->dio_complete handler and data for that callback. Rather than punt the completion to a workqueue, we pass back the handler and data to the issuer and will get a callback from a safe task context. Using the following fio job to randomly dio write 4k blocks at queue depths of 1..16: fio --name=dio-write --filename=/data1/file --time_based=1 \ --runtime=10 --bs=4096 --rw=randwrite --norandommap --buffered=0 \ --cpus_allowed=4 --ioengine=io_uring --iodepth=$depth shows the following results before and after this patch: Stock Patched Diff ======================================= QD1 155K 162K + 4.5% QD2 290K 313K + 7.9% QD4 533K 597K +12.0% QD8 604K 827K +36.9% QD16 615K 845K +37.4% which shows nice wins all around. If we factored in per-IOP efficiency, the wins look even nicer. This becomes apparent as queue depth rises, as the offloaded workqueue completions runs out of steam. Signed-off-by: Jens Axboe Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- fs/iomap/direct-io.c | 54 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 53 insertions(+), 1 deletion(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index cce9af019705..de86680968a4 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -20,6 +20,7 @@ * Private flags for iomap_dio, must not overlap with the public ones in * iomap.h: */ +#define IOMAP_DIO_DEFER_COMP (1 << 26) #define IOMAP_DIO_INLINE_COMP (1 << 27) #define IOMAP_DIO_STABLE_WRITE (1 << 28) #define IOMAP_DIO_NEED_SYNC (1 << 29) @@ -132,6 +133,11 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio) } EXPORT_SYMBOL_GPL(iomap_dio_complete); +static ssize_t iomap_dio_deferred_complete(void *data) +{ + return iomap_dio_complete(data); +} + static void iomap_dio_complete_work(struct work_struct *work) { struct iomap_dio *dio = container_of(work, struct iomap_dio, aio.work); @@ -192,6 +198,31 @@ void iomap_dio_bio_end_io(struct bio *bio) goto release_bio; } + /* + * If this dio is flagged with IOMAP_DIO_DEFER_COMP, then schedule + * our completion that way to avoid an async punt to a workqueue. + */ + if (dio->flags & IOMAP_DIO_DEFER_COMP) { + /* only polled IO cares about private cleared */ + iocb->private = dio; + iocb->dio_complete = iomap_dio_deferred_complete; + + /* + * Invoke ->ki_complete() directly. We've assigned out + * dio_complete callback handler, and since the issuer set + * IOCB_DIO_DEFER, we know their ki_complete handler will + * notice ->dio_complete being set and will defer calling that + * handler until it can be done from a safe task context. + * + * Note that the 'res' being passed in here is not important + * for this case. The actual completion value of the request + * will be gotten from dio_complete when that is run by the + * issuer. + */ + iocb->ki_complete(iocb, 0); + goto release_bio; + } + /* * Async DIO completion that requires filesystem level completion work * gets punted to a work queue to complete as the operation may require @@ -288,12 +319,17 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, * after IO completion such as unwritten extent conversion) and * the underlying device either supports FUA or doesn't have * a volatile write cache. This allows us to avoid cache flushes - * on IO completion. + * on IO completion. If we can't use stable writes and need to + * sync, disable in-task completions as dio completion will + * need to call generic_write_sync() which will do a blocking + * fsync / cache flush call. */ if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) && (dio->flags & IOMAP_DIO_STABLE_WRITE) && (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev))) use_fua = true; + else if (dio->flags & IOMAP_DIO_NEED_SYNC) + dio->flags &= ~IOMAP_DIO_DEFER_COMP; } /* @@ -319,6 +355,13 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, pad = pos & (fs_block_size - 1); if (pad) iomap_dio_zero(iter, dio, pos - pad, pad); + + /* + * If need_zeroout is set, then this is a new or unwritten + * extent. These need extra handling at completion time, so + * disable in-task deferred completion for those. + */ + dio->flags &= ~IOMAP_DIO_DEFER_COMP; } /* @@ -557,6 +600,15 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, iomi.flags |= IOMAP_WRITE; dio->flags |= IOMAP_DIO_WRITE; + /* + * Flag as supporting deferred completions, if the issuer + * groks it. This can avoid a workqueue punt for writes. + * We may later clear this flag if we need to do other IO + * as part of this IO completion. + */ + if (iocb->ki_flags & IOCB_DIO_DEFER) + dio->flags |= IOMAP_DIO_DEFER_COMP; + if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) { ret = -EAGAIN; if (iomi.pos >= dio->i_size ||