From patchwork Wed Jul 19 19:54:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13319444 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8507DC001DF for ; Wed, 19 Jul 2023 19:54:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231301AbjGSTy2 (ORCPT ); Wed, 19 Jul 2023 15:54:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229853AbjGSTy1 (ORCPT ); Wed, 19 Jul 2023 15:54:27 -0400 Received: from mail-il1-x12a.google.com (mail-il1-x12a.google.com [IPv6:2607:f8b0:4864:20::12a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F27B5A7 for ; Wed, 19 Jul 2023 12:54:24 -0700 (PDT) Received: by mail-il1-x12a.google.com with SMTP id e9e14a558f8ab-34637e55d9dso134495ab.1 for ; Wed, 19 Jul 2023 12:54:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689796464; x=1690401264; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=jpHtyM71rycbGzxm88J44jHtqMHJstO4/q8wpN7wJCA=; b=FNlYw+GmMeqZJ1fUIFhKyYLPOtHct1s9GU3dXBrGMfiPuHhz1RMtma4KEybGDZVewv +yLSdH4uw/auwxtOERf6D1LE0rQDTThqOkasvglSvnELfGwkO5f3ARhMwiKM2+O4MK/B YL29w+OunBD+ljYgIXYQiYmaHMCr4HJXB5cHgtF+zd4XO1o6L0WuEAafd9JPycncWuU3 BF02i9/sA8lPl2md/yIWvmPCtprJGPF8XAxRMJsTuNi2yGA/llqePQaB50AhxlCgSm2I wLMdhFC3HfZFB/lk1SZ13yJk8Xr0Vl4WWfodSGkxFwQj9nW+7uCVY886kGQzy/l6dcGV sCaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689796464; x=1690401264; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jpHtyM71rycbGzxm88J44jHtqMHJstO4/q8wpN7wJCA=; b=Q43UbJ6MTwmALVAJJ+eeB5Vs163IQ+YDG+lJunXZjUnqmqteFFSZ3iMMrVUBIL3ZXm GAQ6ipwOOUJxezAk5dySX+qXBAaMg2pJfTZJPjn4fJUHGSzxBgOYLbYRP7cEks42Dimc 9azr6N2KTQ1HkvfyIiZiwVq5wckADw+0bF1qvTrJcaLYijf56Dv6YsSaZH10Sk4dSqBn p2a4j0G1alHGy9lv0VFOiK12t/EoizTdbHMRSrBqbxRCEuW0zm5KK5eQXL3hZD4oEJwj cP7hEpn8HZFVoo7+1qYJAWX4ggS+kTpJiJw7KbRSYX4l9C6V15IpOAmg2CAOcTYxzVnW dP3A== X-Gm-Message-State: ABy/qLa7mBEVN0pX2gHnuEaVKW4l3eCFlLek19VNaLS/mgt/BQiWnEt2 GnnM+GKTwfTx/2KzVSv342GGVw== X-Google-Smtp-Source: APBJJlGCrp8uJatS31bBDATPPWJD65ZxrfZJ395iuU0Uxn4EkwUvcftEqKlHKsbRMKYfjuQZrGl8zQ== X-Received: by 2002:a92:c243:0:b0:346:1919:7cb1 with SMTP id k3-20020a92c243000000b0034619197cb1mr9293382ilo.2.1689796464315; Wed, 19 Jul 2023 12:54:24 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id j21-20020a02a695000000b0042bb13cb80fsm1471893jam.120.2023.07.19.12.54.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Jul 2023 12:54:23 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 1/6] iomap: cleanup up iomap_dio_bio_end_io() Date: Wed, 19 Jul 2023 13:54:12 -0600 Message-Id: <20230719195417.1704513-2-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230719195417.1704513-1-axboe@kernel.dk> References: <20230719195417.1704513-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Make the logic a bit easier to follow: 1) Add a release_bio out path, as everybody needs to touch that, and have our bio ref check jump there if it's non-zero. 2) Add a kiocb local variable. 3) Add comments for each of the three conditions (sync, inline, or async workqueue punt). No functional changes in this patch. Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 43 ++++++++++++++++++++++++++++--------------- 1 file changed, 28 insertions(+), 15 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index ea3b868c8355..1c32f734c767 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -152,27 +152,40 @@ void iomap_dio_bio_end_io(struct bio *bio) { struct iomap_dio *dio = bio->bi_private; bool should_dirty = (dio->flags & IOMAP_DIO_DIRTY); + struct kiocb *iocb = dio->iocb; if (bio->bi_status) iomap_dio_set_error(dio, blk_status_to_errno(bio->bi_status)); + if (!atomic_dec_and_test(&dio->ref)) + goto release_bio; - if (atomic_dec_and_test(&dio->ref)) { - if (dio->wait_for_completion) { - struct task_struct *waiter = dio->submit.waiter; - WRITE_ONCE(dio->submit.waiter, NULL); - blk_wake_io_task(waiter); - } else if (dio->flags & IOMAP_DIO_WRITE) { - struct inode *inode = file_inode(dio->iocb->ki_filp); - - WRITE_ONCE(dio->iocb->private, NULL); - INIT_WORK(&dio->aio.work, iomap_dio_complete_work); - queue_work(inode->i_sb->s_dio_done_wq, &dio->aio.work); - } else { - WRITE_ONCE(dio->iocb->private, NULL); - iomap_dio_complete_work(&dio->aio.work); - } + /* + * Synchronous dio, task itself will handle any completion work + * that needs after IO. All we need to do is wake the task. + */ + if (dio->wait_for_completion) { + struct task_struct *waiter = dio->submit.waiter; + WRITE_ONCE(dio->submit.waiter, NULL); + blk_wake_io_task(waiter); + goto release_bio; + } + + /* + * If this dio is an async write, queue completion work for async + * handling. Reads can always complete inline. + */ + if (dio->flags & IOMAP_DIO_WRITE) { + struct inode *inode = file_inode(iocb->ki_filp); + + WRITE_ONCE(iocb->private, NULL); + INIT_WORK(&dio->aio.work, iomap_dio_complete_work); + queue_work(inode->i_sb->s_dio_done_wq, &dio->aio.work); + } else { + WRITE_ONCE(iocb->private, NULL); + iomap_dio_complete_work(&dio->aio.work); } +release_bio: if (should_dirty) { bio_check_pages_dirty(bio); } else { From patchwork Wed Jul 19 19:54:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13319443 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BDCAC001DC for ; Wed, 19 Jul 2023 19:54:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231302AbjGSTy3 (ORCPT ); Wed, 19 Jul 2023 15:54:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41470 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231147AbjGSTy2 (ORCPT ); Wed, 19 Jul 2023 15:54:28 -0400 Received: from mail-il1-x135.google.com (mail-il1-x135.google.com [IPv6:2607:f8b0:4864:20::135]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A92DB3 for ; Wed, 19 Jul 2023 12:54:26 -0700 (PDT) Received: by mail-il1-x135.google.com with SMTP id e9e14a558f8ab-3461b58c61dso149485ab.1 for ; Wed, 19 Jul 2023 12:54:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689796465; x=1692388465; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=45begm+fIx+LrUIPU3YZoGzZeihEJfvQZCl04QelfsE=; b=YqXQR4MHANqJ4lqNW+dVVcLxA6p+84Z8JqnMPbx2YwUnr3ejibQ3s0RNm3WmVb+2zp BaxyAnKKw7OEVhFtW+ifnNtlqJ03MjuOcUeaK8wrGrZLcT0s51LMRP3DQr5txqrzr3XH +g74qADtzwULZf650xEAwHFSiW3r/4o8hJ95HRlmucc6K9AilmzmK21wOtrOlSL2p/9a +3HDWmTrunAiIJVU7SMz82WLYie45vODi+6FtLpQaxRdLxAS7QwLp19vqnZqkK0YHFKy 8mGqA4Rt4kYNJrlbwGFst2KNYRja3/r+6VG+/UjbkR2wUVQ4aw1E4ILMu9tSSpEW2Ql0 XvHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689796465; x=1692388465; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=45begm+fIx+LrUIPU3YZoGzZeihEJfvQZCl04QelfsE=; b=DiBCSALI4xtup5Z4jCIK7R6SkEZrG1DS0BP35izwyHW3/8IGIC1nWaU4pXnMijpKYR cauui4U+gaQwm7CO0M4/L1Y+iYHoEr5EaqrcfDA54kl34O4MRwWHJOc5iGTMwSQePqW6 JtIzYX0tmTfc1/WzieeCkJ3m5ZKWopSKeHcXbWqUxASHN3nSw6IXT12W0iikDZlIzzXr zYlF3sUnFHFbcUWqlMbbfX/62qVtmEStWkh71xYKkq0BjVStj+kdarFC1YLshTDfmKrb O/XFWDvEOPGijnO8GOidtfzIc4wVsn0V1/mX+HUsz1TWA0Giu+wYw3nIsHGQaoMRuojH kPZg== X-Gm-Message-State: ABy/qLaDz1zfXHDHDwBwlvCM7O+T1u6VzxaSa8fE3emyezGVFMtGgvRO WYy5tqaGZv0/yusvV9UmLMwbZ00+3azeBp3drco= X-Google-Smtp-Source: APBJJlFFfZAK6BEoCWdwWC6nABZAvmNLxmkBzkKmeWekoeuiJzcEvs+ie0vVzdLsUtQ159wXwx5Zbg== X-Received: by 2002:a05:6602:3404:b0:77a:ee79:652 with SMTP id n4-20020a056602340400b0077aee790652mr511333ioz.1.1689796465599; Wed, 19 Jul 2023 12:54:25 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id j21-20020a02a695000000b0042bb13cb80fsm1471893jam.120.2023.07.19.12.54.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Jul 2023 12:54:24 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 2/6] iomap: add IOMAP_DIO_INLINE_COMP Date: Wed, 19 Jul 2023 13:54:13 -0600 Message-Id: <20230719195417.1704513-3-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230719195417.1704513-1-axboe@kernel.dk> References: <20230719195417.1704513-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Rather than gate whether or not we need to punt a dio completion to a workqueue, add an explicit flag for it. For now we treat them the same, reads always set the flags and async writes do not. No functional changes in this patch. Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 27 ++++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 1c32f734c767..6b302bf8790b 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -20,6 +20,7 @@ * Private flags for iomap_dio, must not overlap with the public ones in * iomap.h: */ +#define IOMAP_DIO_INLINE_COMP (1 << 27) #define IOMAP_DIO_WRITE_FUA (1 << 28) #define IOMAP_DIO_NEED_SYNC (1 << 29) #define IOMAP_DIO_WRITE (1 << 30) @@ -171,20 +172,25 @@ void iomap_dio_bio_end_io(struct bio *bio) } /* - * If this dio is an async write, queue completion work for async - * handling. Reads can always complete inline. + * Flagged with IOMAP_DIO_INLINE_COMP, we can complete it inline */ - if (dio->flags & IOMAP_DIO_WRITE) { - struct inode *inode = file_inode(iocb->ki_filp); - - WRITE_ONCE(iocb->private, NULL); - INIT_WORK(&dio->aio.work, iomap_dio_complete_work); - queue_work(inode->i_sb->s_dio_done_wq, &dio->aio.work); - } else { + if (dio->flags & IOMAP_DIO_INLINE_COMP) { WRITE_ONCE(iocb->private, NULL); iomap_dio_complete_work(&dio->aio.work); + goto release_bio; } + /* + * Async DIO completion that requires filesystem level completion work + * gets punted to a work queue to complete as the operation may require + * more IO to be issued to finalise filesystem metadata changes or + * guarantee data integrity. + */ + WRITE_ONCE(iocb->private, NULL); + INIT_WORK(&dio->aio.work, iomap_dio_complete_work); + queue_work(file_inode(iocb->ki_filp)->i_sb->s_dio_done_wq, + &dio->aio.work); + release_bio: if (should_dirty) { bio_check_pages_dirty(bio); @@ -524,6 +530,9 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, iomi.flags |= IOMAP_NOWAIT; if (iov_iter_rw(iter) == READ) { + /* reads can always complete inline */ + dio->flags |= IOMAP_DIO_INLINE_COMP; + if (iomi.pos >= dio->i_size) goto out_free_dio; From patchwork Wed Jul 19 19:54:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13319445 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2FBBC3DA40 for ; Wed, 19 Jul 2023 19:54:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231308AbjGSTya (ORCPT ); Wed, 19 Jul 2023 15:54:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41476 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231304AbjGSTy2 (ORCPT ); Wed, 19 Jul 2023 15:54:28 -0400 Received: from mail-il1-x12b.google.com (mail-il1-x12b.google.com [IPv6:2607:f8b0:4864:20::12b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D8A51FD3 for ; Wed, 19 Jul 2023 12:54:27 -0700 (PDT) Received: by mail-il1-x12b.google.com with SMTP id e9e14a558f8ab-34637e55d9dso134515ab.1 for ; Wed, 19 Jul 2023 12:54:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689796466; x=1690401266; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=giarwnzcCaT/PBvps45FEOjjLZagrA/AJXIKNe98qXk=; b=eksc0EVn1J7E9i2FEHXrZSmg9JRoO+qtoe8f9+uIhNMqDFZiYOx1oRbck3k8Elvw8l lVfKAeju+yS+rGI67gQI27KgveS4qQZ3OG18L1MeD043TMBKI87nLpqa8QX8VgKILRYA yd8dGS1bnVZir0jayDAP9fYUKf3Wj0Mj29dpGbYiWfqMNENj0338HyGKqod4WZJJoAyU 6k3G1TaLzHLZ7o/WsTFxuY2/hCmKs7afLqrsp1yqcwT2+cszT7SUNhFr4GjeJnJtXkEH 3ZysLsIWOVOW45NlVm65bQn3M75oJ17oIDFD8daifdZb0pfPkjnO159lHDsiDqsJUAE6 NNaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689796466; x=1690401266; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=giarwnzcCaT/PBvps45FEOjjLZagrA/AJXIKNe98qXk=; b=DV76x6dPRPKpW/TRWEsA6rVE5BfmSklj7cybUBCYTMgQEDbQC3AQrdNNNq13yWkPtT MD77k15Vpkc9BBcdw90DyQ6welJ5G1n9oJPJoukgJQLeiQJAbhwK0rCMs25zVa75bI0O 9W6On6H9uLeFH/0khRsc9GqFU3bHXb7suifihVckIb0WKwRG3HbyW7QRd9iAapDzojiO QPpTGch+eBNzcIc/o8rXistgTg/oundgvcDtRweYxmoxW3hSb9yQU7rcsN5Dwpv9hdKz 6npu3h+5xzI3CkKdFx9uHtepYWi0KHLy0tMNYyhFOwEP9l5KPCaRYkL9cugOLD31s/eu 5jXQ== X-Gm-Message-State: ABy/qLZKAUj8kgdaf1unxVXcl5bFCCtUryz1Hl2ohNBmiidtesA7LfTb ClSS6jxkyqm212FWJ+n64jRNEzjyggMLV9qnb4M= X-Google-Smtp-Source: APBJJlGAukebKX7XN84+x+/Sd6SfjBNgKmNb3bQZ5eAU4XvDWiN/z7PSkEMd6iAlarxUJBBPh0asUg== X-Received: by 2002:a92:c243:0:b0:346:1919:7cb1 with SMTP id k3-20020a92c243000000b0034619197cb1mr9293424ilo.2.1689796466653; Wed, 19 Jul 2023 12:54:26 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id j21-20020a02a695000000b0042bb13cb80fsm1471893jam.120.2023.07.19.12.54.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Jul 2023 12:54:26 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 3/6] iomap: treat a write through cache the same as FUA Date: Wed, 19 Jul 2023 13:54:14 -0600 Message-Id: <20230719195417.1704513-4-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230719195417.1704513-1-axboe@kernel.dk> References: <20230719195417.1704513-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Whether we have a write back cache and are using FUA or don't have a write back cache at all is the same situation. Treat them the same. Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 6b302bf8790b..b30c3edf2ef3 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -280,7 +280,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, * cache flushes on IO completion. */ if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) && - (dio->flags & IOMAP_DIO_WRITE_FUA) && bdev_fua(iomap->bdev)) + (dio->flags & IOMAP_DIO_WRITE_FUA) && + (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev))) use_fua = true; } From patchwork Wed Jul 19 19:54:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13319446 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73D79C001DC for ; Wed, 19 Jul 2023 19:54:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231310AbjGSTyb (ORCPT ); Wed, 19 Jul 2023 15:54:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231304AbjGSTya (ORCPT ); Wed, 19 Jul 2023 15:54:30 -0400 Received: from mail-il1-x12d.google.com (mail-il1-x12d.google.com [IPv6:2607:f8b0:4864:20::12d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C70C9A7 for ; Wed, 19 Jul 2023 12:54:28 -0700 (PDT) Received: by mail-il1-x12d.google.com with SMTP id e9e14a558f8ab-34637e55d9dso134565ab.1 for ; Wed, 19 Jul 2023 12:54:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689796468; x=1690401268; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8G1ATQFuI41fkL8nxWaXO/6T7p7QjpNqJTfODI4BURk=; b=zrDwjC1d0FFyHF4BZzGKrNTGsfN9BlDcgqYOtgjpIeUfI8cUJ2gWoJWcV2v5/PLiTu HeyoUydrQK56YU+zKI+YOR0ZH2b8tjRZfUCtpOrHth4YDgvMaZvG+4yacsCAOxtqOivn 7BnToAzllX09eFjFXYh8ZhuOsPY5pg6h8b656mooeFRPrwJLe+76YIQs71YoAd7Pss7q OlRLGqCsox8ZlBSYH09RSut8BwtoVWBctVkMasOeGR3+IYGRmNmKRe9Bvc0W3ZBFopC9 KNpv70BDCVWiaEUNYjAtAEgyxv6ZTXhQPXLOxuAH1Q+uHzBkYUFh6wmErTXD5xCloULh gTaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689796468; x=1690401268; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8G1ATQFuI41fkL8nxWaXO/6T7p7QjpNqJTfODI4BURk=; b=I+3aSqUZsOmmzkQ+OuBnVlYTLrEUtXxGsyFkDfY7Jqvobn+WuQRdF0B2K43pkU/6QL NazKDDJRgspFGVWoWvZp66rvtLyKRgfxbtmRUEIfLTN68C737RwgeFsnKfDvjMd+UCOt B+GWqKsX1dz0+XcADEj/+OMbSAbGlHx7hqOE21dlrTClmT2FUU5brIobsqG3UJ3S1S2t AtlvONBu1ik2Je2UUjoJyr8yVqdx7aeHMBbEoTQzIKoVFO13B1VWT121+LidGfM9UnSK El3Iqjf0h+gqSUSQO3MVucB1ufw5Y0jdMeG483W5rt8RR9YFP2Vr0KuY8YEVDY6/kYVu 00lA== X-Gm-Message-State: ABy/qLYsNhgK11nEMNGR2Z3dqNkf8OYwknBX9ptMswaf5bT1UMahbCSN QkRr73uMWI/OwKI6E3RVjFSw9A== X-Google-Smtp-Source: APBJJlF33Mp1ZkeKdSrYKVbcfXze+rZUfO0282dRzlMGTO0XNHw70QFDmxUMZpxKQSFviRi7haLNVg== X-Received: by 2002:a92:c243:0:b0:346:1919:7cb1 with SMTP id k3-20020a92c243000000b0034619197cb1mr9293457ilo.2.1689796468044; Wed, 19 Jul 2023 12:54:28 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id j21-20020a02a695000000b0042bb13cb80fsm1471893jam.120.2023.07.19.12.54.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Jul 2023 12:54:27 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 4/6] fs: add IOCB flags related to passing back dio completions Date: Wed, 19 Jul 2023 13:54:15 -0600 Message-Id: <20230719195417.1704513-5-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230719195417.1704513-1-axboe@kernel.dk> References: <20230719195417.1704513-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Async dio completions generally happen from hard/soft IRQ context, which means that users like iomap may need to defer some of the completion handling to a workqueue. This is less efficient than having the original issuer handle it, like we do for sync IO, and it adds latency to the completions. Add IOCB_DIO_DEFER, which the issuer can set if it is able to safely punt these completions to a safe context. If the dio handler is aware of this flag, assign a callback handler in kiocb->dio_complete and associated data io kiocb->private. The issuer will then call this handler with that data from task context. No functional changes in this patch. Signed-off-by: Jens Axboe --- include/linux/fs.h | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 6867512907d6..115382f66d79 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -338,6 +338,16 @@ enum rw_hint { #define IOCB_NOIO (1 << 20) /* can use bio alloc cache */ #define IOCB_ALLOC_CACHE (1 << 21) +/* + * IOCB_DIO_DEFER can be set by the iocb owner, to indicate that the + * iocb completion can be passed back to the owner for execution from a safe + * context rather than needing to be punted through a workqueue. If this + * flag is set, the completion handling may set iocb->dio_complete to a + * handler, which the issuer will then call from task context to complete + * the processing of the iocb. iocb->private should then also be set to + * the argument being passed to this handler. + */ +#define IOCB_DIO_DEFER (1 << 22) /* for use in trace events */ #define TRACE_IOCB_STRINGS \ @@ -351,7 +361,8 @@ enum rw_hint { { IOCB_WRITE, "WRITE" }, \ { IOCB_WAITQ, "WAITQ" }, \ { IOCB_NOIO, "NOIO" }, \ - { IOCB_ALLOC_CACHE, "ALLOC_CACHE" } + { IOCB_ALLOC_CACHE, "ALLOC_CACHE" }, \ + { IOCB_DIO_DEFER, "DIO_DEFER" } struct kiocb { struct file *ki_filp; @@ -360,7 +371,22 @@ struct kiocb { void *private; int ki_flags; u16 ki_ioprio; /* See linux/ioprio.h */ - struct wait_page_queue *ki_waitq; /* for async buffered IO */ + union { + /* + * Only used for async buffered reads, where it denotes the + * page waitqueue associated with completing the read. Valid + * IFF IOCB_WAITQ is set. + */ + struct wait_page_queue *ki_waitq; + /* + * Can be used for O_DIRECT IO, where the completion handling + * is punted back to the issuer of the IO. May only be set + * if IOCB_DIO_DEFER is set by the issuer, and the issuer must + * then check for presence of this handler when ki_complete is + * invoked. + */ + ssize_t (*dio_complete)(void *data); + }; }; static inline bool is_sync_kiocb(struct kiocb *kiocb) From patchwork Wed Jul 19 19:54:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13319447 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B929C001DF for ; Wed, 19 Jul 2023 19:54:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230041AbjGSTyc (ORCPT ); Wed, 19 Jul 2023 15:54:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41534 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231313AbjGSTyb (ORCPT ); Wed, 19 Jul 2023 15:54:31 -0400 Received: from mail-il1-x129.google.com (mail-il1-x129.google.com [IPv6:2607:f8b0:4864:20::129]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59EAE1FD6 for ; Wed, 19 Jul 2023 12:54:30 -0700 (PDT) Received: by mail-il1-x129.google.com with SMTP id e9e14a558f8ab-346434c7793so171425ab.0 for ; Wed, 19 Jul 2023 12:54:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689796469; x=1692388469; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=YmSjJDLDmeliZeLZZd2Btg4MO1D55XByrYaapFM2zhA=; b=pAxJhc8Zrqww3ifH7tblUbxcnSaGGpK6Flzmom88IdoU4B6DS38U4aSmjG/hCStwDs 8OK8brESywMi2HMvjpegdMgf10LhDAqY2LRx0uHjG/27of6Atj7tKZGn3AzL44Lkw8eC Ny2RPnGhyIrh3ibS3g83vWFAV2Qv9II94vA6KErvtFudclA6oKOX8lsU/YyVRQt1QaSo g782Au/0mIUGTbLxLX9a1aIPqqUw4ipXxkJtfFXRWMEplopn9AiQv/mXoJEVzKE3PfoY g4qWlKXYkjZQRKAVbSEAWGDgbgejPhZJRX629AGt7H8yaxM54EIiaYl8QjsSwBpgIyBx jXCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689796469; x=1692388469; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YmSjJDLDmeliZeLZZd2Btg4MO1D55XByrYaapFM2zhA=; b=LjI6zgUWvKE/ARipBA9tMsKBTidXwXAjREav4xGFV4zfInI4FV+ffz1y7EJE1J0L4U rb2+tP3rI67Gi/BefTiQMbNiY2bZtZDNlikvA/F14Vqg0ddbJ9t7U7hgio2RyLz+oLFw w/rrr2CX3P1xIsOcq2ggmM57bRaKUSmoxlmT6WLvjl5zl3mVH4eCdkMQFw/OO4QEWAbx 8RYSEIQvYrszEIuYmIQnUiyDRarS0F/gabL4c3GZLZaLHl2wmJNYB+OhnYCiz1dFdBR6 xS/Rsldv/8JRptFFyD3sIIOCskKD0LYahnbZtBNOkiWQAL4dOgELndVVHnjDSAY/DGKv TqYg== X-Gm-Message-State: ABy/qLbitAXL3Es8Yy0LJPMZ0ADsYpGuRearjN1LoI1FPzZ2Xh68eAcI ZJ7uh2VwmAolFmtW1LU6c+Pfig== X-Google-Smtp-Source: APBJJlG5AcxR0HUDfbb2X91dM4A8cbc003r5OK2/TGl957f1MPdCIviBNGBxD1RJ5yYp6sohizmfUg== X-Received: by 2002:a05:6602:3710:b0:788:2d78:813c with SMTP id bh16-20020a056602371000b007882d78813cmr498953iob.0.1689796469594; Wed, 19 Jul 2023 12:54:29 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id j21-20020a02a695000000b0042bb13cb80fsm1471893jam.120.2023.07.19.12.54.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Jul 2023 12:54:28 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 5/6] io_uring/rw: add write support for IOCB_DIO_DEFER Date: Wed, 19 Jul 2023 13:54:16 -0600 Message-Id: <20230719195417.1704513-6-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230719195417.1704513-1-axboe@kernel.dk> References: <20230719195417.1704513-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org If the filesystem dio handler understands IOCB_DIO_DEFER, we'll get a kiocb->ki_complete() callback with kiocb->dio_complete set. In that case, rather than complete the IO directly through task_work, queue up an intermediate task_work handler that first processes this callback and then immediately completes the request. For XFS, this avoids a punt through a workqueue, which is a lot less efficient and adds latency to lower queue depth (or sync) O_DIRECT writes. Signed-off-by: Jens Axboe --- io_uring/rw.c | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/io_uring/rw.c b/io_uring/rw.c index 1bce2208b65c..4657e11acf02 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -285,6 +285,14 @@ static inline int io_fixup_rw_res(struct io_kiocb *req, long res) void io_req_rw_complete(struct io_kiocb *req, struct io_tw_state *ts) { + struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); + + if (rw->kiocb.dio_complete) { + long res = rw->kiocb.dio_complete(rw->kiocb.private); + + io_req_set_res(req, io_fixup_rw_res(req, res), 0); + } + io_req_io_end(req); if (req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)) { @@ -300,9 +308,11 @@ static void io_complete_rw(struct kiocb *kiocb, long res) struct io_rw *rw = container_of(kiocb, struct io_rw, kiocb); struct io_kiocb *req = cmd_to_io_kiocb(rw); - if (__io_complete_rw_common(req, res)) - return; - io_req_set_res(req, io_fixup_rw_res(req, res), 0); + if (!rw->kiocb.dio_complete) { + if (__io_complete_rw_common(req, res)) + return; + io_req_set_res(req, io_fixup_rw_res(req, res), 0); + } req->io_task_work.func = io_req_rw_complete; __io_req_task_work_add(req, IOU_F_TWQ_LAZY_WAKE); } @@ -312,6 +322,9 @@ static void io_complete_rw_iopoll(struct kiocb *kiocb, long res) struct io_rw *rw = container_of(kiocb, struct io_rw, kiocb); struct io_kiocb *req = cmd_to_io_kiocb(rw); + if (rw->kiocb.dio_complete) + res = rw->kiocb.dio_complete(rw->kiocb.private); + if (kiocb->ki_flags & IOCB_WRITE) kiocb_end_write(req); if (unlikely(res != req->cqe.res)) { @@ -914,7 +927,13 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags) __sb_writers_release(file_inode(req->file)->i_sb, SB_FREEZE_WRITE); } - kiocb->ki_flags |= IOCB_WRITE; + + /* + * Set IOCB_DIO_DEFER, stating that our handler groks deferring the + * completion to task context. + */ + kiocb->ki_flags |= IOCB_WRITE | IOCB_DIO_DEFER; + kiocb->dio_complete = NULL; if (likely(req->file->f_op->write_iter)) ret2 = call_write_iter(req->file, kiocb, &s->iter); From patchwork Wed Jul 19 19:54:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13319448 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D5ACC001E0 for ; Wed, 19 Jul 2023 19:54:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231319AbjGSTyd (ORCPT ); Wed, 19 Jul 2023 15:54:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41560 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231309AbjGSTyd (ORCPT ); Wed, 19 Jul 2023 15:54:33 -0400 Received: from mail-io1-xd2c.google.com (mail-io1-xd2c.google.com [IPv6:2607:f8b0:4864:20::d2c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C1AFDA7 for ; Wed, 19 Jul 2023 12:54:31 -0700 (PDT) Received: by mail-io1-xd2c.google.com with SMTP id ca18e2360f4ac-785ccd731a7so30839f.0 for ; Wed, 19 Jul 2023 12:54:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689796471; x=1690401271; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=kTvYQlDOhP6EOVmXA2umB2ZKrh0BWM6v9ydQBvMCtEo=; b=bntvLtM93TfxHDv1res/Le2NnqIAI2hPK1vOgQO+ZboYSOMGk0AyDqi0tcI2iSWnP2 kBPZ2hoO6+JtUocaEoPTyx+AnaVilw9JCpUvrbfOISFu8KlGjVmsTul36huDeAfZFZom famDHX0bTR7bVniAyiKg1W1WvspZAHb9qmUv5yZ40X+abb0PeHmq7e/9tRRPBZwXJzSF LX/bCOiBKfF7z0Og5QDpXU5O+x4rltwbjzmkW9YYP0nZISYPG2ZRZQWdKEDHgHFwV8lA cITkp69rHLOflHD7M2nh/eraJ8QIJhB9FK8g6SRJ2PIyDH4CPvm+UjZbjZi0UCHoDGfE yhKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689796471; x=1690401271; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kTvYQlDOhP6EOVmXA2umB2ZKrh0BWM6v9ydQBvMCtEo=; b=TAtCXRapIKi2yF9rvi08VXk7WGmZudIDVCq3XB4JjW0esiEufUdPEfq5L3YEODbolV o1/rY4U+X/W5myNPVjqxDyCMh3We/ACc/JethoGRj1muwi2G7WedGKe736a7KBsf+jRQ 7yqWz0TxU15Hi/uLvUj/UVzsCi1U/5YMldjrL8Z12VVXjh+3DQF8XBJrYXQ0bXzMU/7t m4Of8kNkm+N3l4RRGUM2GxUbctoEsEK/nSeLkV1L6WzEqoUnzFzkrJUfiB5jLTSBB2Bz ++H4MB4C7wz+mGpPmEwSWHEPFRaL7B6A72Xdz+8/7YmTSQLg0++sSDvJuQ5AUJ5NGdMw AA4A== X-Gm-Message-State: ABy/qLaHR0c7Nkgr8bkFBNK1+BcECDbRsKhOZopRZZDMz3MLe/I40WHu 312tbtxA0HtLR0HWwcYzTlrEmA== X-Google-Smtp-Source: APBJJlFnEcVMxJ5LOd6PR/a75IhjhuRWzWbS5J5uUVT/bik2+nthisdHTN19bYSQbo/z7VRMMUf64Q== X-Received: by 2002:a05:6602:3423:b0:780:d65c:d78f with SMTP id n35-20020a056602342300b00780d65cd78fmr586047ioz.2.1689796471049; Wed, 19 Jul 2023 12:54:31 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id j21-20020a02a695000000b0042bb13cb80fsm1471893jam.120.2023.07.19.12.54.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Jul 2023 12:54:30 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 6/6] iomap: support IOCB_DIO_DEFER Date: Wed, 19 Jul 2023 13:54:17 -0600 Message-Id: <20230719195417.1704513-7-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230719195417.1704513-1-axboe@kernel.dk> References: <20230719195417.1704513-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org If IOCB_DIO_DEFER is set, utilize that to set kiocb->dio_complete handler and data for that callback. Rather than punt the completion to a workqueue, we pass back the handler and data to the issuer and will get a callback from a safe task context. Using the following fio job to randomly dio write 4k blocks at queue depths of 1..16: fio --name=dio-write --filename=/data1/file --time_based=1 \ --runtime=10 --bs=4096 --rw=randwrite --norandommap --buffered=0 \ --cpus_allowed=4 --ioengine=io_uring --iodepth=$depth shows the following results before and after this patch: Stock Patched Diff ======================================= QD1 155K 162K + 4.5% QD2 290K 313K + 7.9% QD4 533K 597K +12.0% QD8 604K 827K +36.9% QD16 615K 845K +37.4% which shows nice wins all around. If we factored in per-IOP efficiency, the wins look even nicer. This becomes apparent as queue depth rises, as the offloaded workqueue completions runs out of steam. Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 47 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 46 insertions(+), 1 deletion(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index b30c3edf2ef3..b7055d50dd99 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -20,6 +20,7 @@ * Private flags for iomap_dio, must not overlap with the public ones in * iomap.h: */ +#define IOMAP_DIO_DEFER_COMP (1 << 26) #define IOMAP_DIO_INLINE_COMP (1 << 27) #define IOMAP_DIO_WRITE_FUA (1 << 28) #define IOMAP_DIO_NEED_SYNC (1 << 29) @@ -131,6 +132,11 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio) } EXPORT_SYMBOL_GPL(iomap_dio_complete); +static ssize_t iomap_dio_deferred_complete(void *data) +{ + return iomap_dio_complete(data); +} + static void iomap_dio_complete_work(struct work_struct *work) { struct iomap_dio *dio = container_of(work, struct iomap_dio, aio.work); @@ -180,6 +186,31 @@ void iomap_dio_bio_end_io(struct bio *bio) goto release_bio; } + /* + * If this dio is flagged with IOMAP_DIO_DEFER_COMP, then schedule + * our completion that way to avoid an async punt to a workqueue. + */ + if (dio->flags & IOMAP_DIO_DEFER_COMP) { + /* only polled IO cares about private cleared */ + iocb->private = dio; + iocb->dio_complete = iomap_dio_deferred_complete; + + /* + * Invoke ->ki_complete() directly. We've assigned out + * dio_complete callback handler, and since the issuer set + * IOCB_DIO_DEFER, we know their ki_complete handler will + * notice ->dio_complete being set and will defer calling that + * handler until it can be done from a safe task context. + * + * Note that the 'res' being passed in here is not important + * for this case. The actual completion value of the request + * will be gotten from dio_complete when that is run by the + * issuer. + */ + iocb->ki_complete(iocb, 0); + goto release_bio; + } + /* * Async DIO completion that requires filesystem level completion work * gets punted to a work queue to complete as the operation may require @@ -277,12 +308,15 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, * data IO that doesn't require any metadata updates (including * after IO completion such as unwritten extent conversion) and * the underlying device supports FUA. This allows us to avoid - * cache flushes on IO completion. + * cache flushes on IO completion. If we can't use FUA and + * need to sync, disable in-task completions. */ if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) && (dio->flags & IOMAP_DIO_WRITE_FUA) && (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev))) use_fua = true; + else if (dio->flags & IOMAP_DIO_NEED_SYNC) + dio->flags &= ~IOMAP_DIO_DEFER_COMP; } /* @@ -308,6 +342,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, pad = pos & (fs_block_size - 1); if (pad) iomap_dio_zero(iter, dio, pos - pad, pad); + + dio->flags &= ~IOMAP_DIO_DEFER_COMP; } /* @@ -547,6 +583,15 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, iomi.flags |= IOMAP_WRITE; dio->flags |= IOMAP_DIO_WRITE; + /* + * Flag as supporting deferred completions, if the issuer + * groks it. This can avoid a workqueue punt for writes. + * We may later clear this flag if we need to do other IO + * as part of this IO completion. + */ + if (iocb->ki_flags & IOCB_DIO_DEFER) + dio->flags |= IOMAP_DIO_DEFER_COMP; + if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) { ret = -EAGAIN; if (iomi.pos >= dio->i_size ||