From patchwork Sun Nov 28 05:52:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12642775 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2EC02C433F5 for ; Sun, 28 Nov 2021 05:55:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232179AbhK1F6i (ORCPT ); Sun, 28 Nov 2021 00:58:38 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:35610 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232434AbhK1F4g (ORCPT ); Sun, 28 Nov 2021 00:56:36 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 8C18021639; Sun, 28 Nov 2021 05:53:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638078800; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uvC8nQa6w+nEdJ1prtfmpQgHUacFxgCpjBjtbrUKVMU=; b=tSRK0nHd0vLUNHsjf0tMYEvjM6Rm/RLsYBGPCkW1QZJ2sRZ9yVrX6EYUl81nyUoYrXQrjc +zmchniJ1BqPEynFsObuO+PXOLjj4o/IAaDnuMcInrLCBPZwBeju/KUFQocl9CmbavXBWu 5/0n5RQqgq/siq3qKJKlLSswRadx+RY= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 3003913446; Sun, 28 Nov 2021 05:53:19 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id uG+WAE8Zo2G7fAAAMHmgww (envelope-from ); Sun, 28 Nov 2021 05:53:19 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH RFC 01/11] btrfs: update an stale comment on btrfs_submit_bio_hook() Date: Sun, 28 Nov 2021 13:52:49 +0800 Message-Id: <20211128055259.39249-2-wqu@suse.com> X-Mailer: git-send-email 2.34.0 In-Reply-To: <20211128055259.39249-1-wqu@suse.com> References: <20211128055259.39249-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This function is renamed to btrfs_submit_data_bio(), update the comment and add extra reason why it doesn't completely follow the same rule in btrfs_submit_data_bio(). Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 91f7ed27e421..91c59b6e279d 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8197,7 +8197,13 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, bool write = btrfs_op(bio) == BTRFS_MAP_WRITE; blk_status_t ret; - /* Check btrfs_submit_bio_hook() for rules about async submit. */ + /* + * Check btrfs_submit_data_bio() for rules about async submit. + * + * The only exception is for RAID56, when there are more than one bios + * to submit, async submit seems to make it harder to collect csums + * for the full stripe. + */ if (async_submit) async_submit = !atomic_read(&BTRFS_I(inode)->sync_writers); From patchwork Sun Nov 28 05:52:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12642777 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64019C43217 for ; Sun, 28 Nov 2021 05:55:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232245AbhK1F6j (ORCPT ); Sun, 28 Nov 2021 00:58:39 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:35618 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232437AbhK1F4i (ORCPT ); Sun, 28 Nov 2021 00:56:38 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id E216C21709; Sun, 28 Nov 2021 05:53:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638078801; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KiHyqRTAUPhFLqPJbKlyhOfOxAOqdrwrtLHycG84yes=; b=VaJtssvwIJYxEYYKK7jcH6Sqq+ct4/C8xsu1LWuCwY+0FO//NiohUSx/oX2nW1j35HJKsr t58FE3pyi5k6GZqRPO+/zahJAtPrs6eWyCLGAJhXch/FrrbcF23+0Wpv52dDUBA7glszlb HoMD/qqm5mxEGaG2bPH6cVcJOG6bQH4= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id E1B0A13446; Sun, 28 Nov 2021 05:53:20 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id kBf/K1AZo2G7fAAAMHmgww (envelope-from ); Sun, 28 Nov 2021 05:53:20 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH RFC 02/11] btrfs: refactor btrfs_map_bio() Date: Sun, 28 Nov 2021 13:52:50 +0800 Message-Id: <20211128055259.39249-3-wqu@suse.com> X-Mailer: git-send-email 2.34.0 In-Reply-To: <20211128055259.39249-1-wqu@suse.com> References: <20211128055259.39249-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently in btrfs_map_bio() we call __btrfs_map_block(), then using the returned bioc to submit real stripes. This is fine if we're only going to handle one bio a time. For the incoming bio split at btrfs_map_bio() time, we want to handle several differnet bios, thus there we introduce a new helper, submit_one_mapped_range() to handle the submission part, making it much easier to make it work in a loop. Signed-off-by: Qu Wenruo --- fs/btrfs/volumes.c | 65 ++++++++++++++++++++++++++++------------------ 1 file changed, 40 insertions(+), 25 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 3dc1de376966..1d40f4fa64a3 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6772,29 +6772,15 @@ static void bioc_error(struct btrfs_io_context *bioc, struct bio *bio, u64 logic } } -blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, - int mirror_num) +static int submit_one_mapped_range(struct btrfs_fs_info *fs_info, struct bio *bio, + struct btrfs_io_context *bioc, u64 map_length, + int mirror_num) { - struct btrfs_device *dev; struct bio *first_bio = bio; - u64 logical = bio->bi_iter.bi_sector << 9; - u64 length = 0; - u64 map_length; - int ret; - int dev_nr; + u64 logical = bio->bi_iter.bi_sector << SECTOR_SHIFT; int total_devs; - struct btrfs_io_context *bioc = NULL; - - length = bio->bi_iter.bi_size; - map_length = length; - - btrfs_bio_counter_inc_blocked(fs_info); - ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical, - &map_length, &bioc, mirror_num, 1); - if (ret) { - btrfs_bio_counter_dec(fs_info); - return errno_to_blk_status(ret); - } + int dev_nr; + int ret; total_devs = bioc->num_stripes; bioc->orig_bio = first_bio; @@ -6813,18 +6799,19 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, mirror_num, 1); } - btrfs_bio_counter_dec(fs_info); - return errno_to_blk_status(ret); + return ret; } - if (map_length < length) { + if (map_length < bio->bi_iter.bi_size) { btrfs_crit(fs_info, - "mapping failed logical %llu bio len %llu len %llu", - logical, length, map_length); + "mapping failed logical %llu bio len %u len %llu", + logical, bio->bi_iter.bi_size, map_length); BUG(); } for (dev_nr = 0; dev_nr < total_devs; dev_nr++) { + struct btrfs_device *dev; + dev = bioc->stripes[dev_nr].dev; if (!dev || !dev->bdev || test_bit(BTRFS_DEV_STATE_MISSING, &dev->dev_state) || @@ -6841,6 +6828,34 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, submit_stripe_bio(bioc, bio, bioc->stripes[dev_nr].physical, dev); } + return 0; +} + +blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, + int mirror_num) +{ + u64 logical = bio->bi_iter.bi_sector << 9; + u64 length = 0; + u64 map_length; + int ret; + struct btrfs_io_context *bioc = NULL; + + length = bio->bi_iter.bi_size; + map_length = length; + + btrfs_bio_counter_inc_blocked(fs_info); + ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical, + &map_length, &bioc, mirror_num, 1); + if (ret) { + btrfs_bio_counter_dec(fs_info); + return errno_to_blk_status(ret); + } + + ret = submit_one_mapped_range(fs_info, bio, bioc, map_length, mirror_num); + if (ret < 0) { + btrfs_bio_counter_dec(fs_info); + return errno_to_blk_status(ret); + } btrfs_bio_counter_dec(fs_info); return BLK_STS_OK; } From patchwork Sun Nov 28 05:52:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12642779 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 847E8C433FE for ; Sun, 28 Nov 2021 05:55:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232434AbhK1F6l (ORCPT ); Sun, 28 Nov 2021 00:58:41 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:45944 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232448AbhK1F4l (ORCPT ); Sun, 28 Nov 2021 00:56:41 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 674DA1FD2A; Sun, 28 Nov 2021 05:53:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638078803; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=S68+8gTwl1zA5w+NULYvB8z8ldF3+YBCOkfJhtJvlDc=; b=KFhbXwlrm2NNcyKZDEAWHbnI6Bho0E02GwrWnscXGjuZlNzonjFp3+9/g1Ffh1Uo9ogeUm qjVtL3Fv4YAKoDg3k5jEXVBfMxGIVIVmZM8Ea5R+td9T7F3Zpdl+EjOxE/ROV8n7B9hTZu nVYvFaYLGStFKjdqYWhxfL3y9nA6ICA= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 44C4313446; Sun, 28 Nov 2021 05:53:22 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id AFakBVIZo2G7fAAAMHmgww (envelope-from ); Sun, 28 Nov 2021 05:53:22 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH RFC 03/11] btrfs: move btrfs_bio_wq_end_io() calls into submit_stripe_bio() Date: Sun, 28 Nov 2021 13:52:51 +0800 Message-Id: <20211128055259.39249-4-wqu@suse.com> X-Mailer: git-send-email 2.34.0 In-Reply-To: <20211128055259.39249-1-wqu@suse.com> References: <20211128055259.39249-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This is a preparation patch for the incoming chunk mapping layer bio split. Function btrfs_bio_wq_end_io() is going to remap bio::bi_private and bio::bi_end_io so that the real endio function will be executed in a workqueue. The problem is, remapped bio::bi_private will be a newly allocated memory, and after the original endio executed, the memory will be freed. This will not work well with split bio. So this patch will move all btrfs_bio_wq_end_io() call into one helper function, btrfs_bio_final_endio_remap(), and call that helper in submit_stripe_bio(). This refactor also unified all data bio behaviors. Before this patch, compressed bio no matter if read or write, will always be delayed using workqueue. However all data write operations are already delayed using ordered extent, and all metadata write doesn't need any delayed execution. Thus this patch will make compressed bios follow the same data read/write behavior. Signed-off-by: Qu Wenruo --- fs/btrfs/compression.c | 4 +--- fs/btrfs/disk-io.c | 9 +-------- fs/btrfs/inode.c | 20 +++++--------------- fs/btrfs/volumes.c | 41 +++++++++++++++++++++++++++++++++++++---- fs/btrfs/volumes.h | 9 ++++++++- 5 files changed, 52 insertions(+), 31 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 32da97c3c19d..64f931fc11f0 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -428,10 +428,8 @@ static blk_status_t submit_compressed_bio(struct btrfs_fs_info *fs_info, { blk_status_t ret; + btrfs_bio(bio)->endio_type = BTRFS_WQ_ENDIO_DATA; ASSERT(bio->bi_iter.bi_size); - ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA); - if (ret) - return ret; ret = btrfs_map_bio(fs_info, bio, mirror_num); return ret; } diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 9d66d48945c6..83218d3dae00 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -920,14 +920,7 @@ blk_status_t btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio, blk_status_t ret; if (btrfs_op(bio) != BTRFS_MAP_WRITE) { - /* - * called for a read, do the setup so that checksum validation - * can happen in the async kernel threads - */ - ret = btrfs_bio_wq_end_io(fs_info, bio, - BTRFS_WQ_ENDIO_METADATA); - if (ret) - goto out_w_error; + btrfs_bio(bio)->endio_type = BTRFS_WQ_ENDIO_METADATA; ret = btrfs_map_bio(fs_info, bio, mirror_num); } else if (!should_async_write(fs_info, BTRFS_I(inode))) { ret = btree_csum_one_bio(bio); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 91c59b6e279d..8e6901aeeb89 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2510,7 +2510,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct btrfs_root *root = BTRFS_I(inode)->root; - enum btrfs_wq_endio_type metadata = BTRFS_WQ_ENDIO_DATA; + enum btrfs_wq_endio_type endio_type = BTRFS_WQ_ENDIO_DATA; blk_status_t ret = 0; int skip_sum; int async = !atomic_read(&BTRFS_I(inode)->sync_writers); @@ -2519,7 +2519,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, !fs_info->csum_root; if (btrfs_is_free_space_inode(BTRFS_I(inode))) - metadata = BTRFS_WQ_ENDIO_FREE_SPACE; + endio_type = BTRFS_WQ_ENDIO_FREE_SPACE; if (bio_op(bio) == REQ_OP_ZONE_APPEND) { struct page *page = bio_first_bvec_all(bio)->bv_page; @@ -2531,10 +2531,7 @@ blk_status_t btrfs_submit_data_bio(struct inode *inode, struct bio *bio, } if (btrfs_op(bio) != BTRFS_MAP_WRITE) { - ret = btrfs_bio_wq_end_io(fs_info, bio, metadata); - if (ret) - goto out; - + btrfs_bio(bio)->endio_type = endio_type; if (bio_flags & EXTENT_BIO_COMPRESSED) { ret = btrfs_submit_compressed_read(inode, bio, mirror_num, @@ -8085,10 +8082,6 @@ static blk_status_t submit_dio_repair_bio(struct inode *inode, struct bio *bio, BUG_ON(bio_op(bio) == REQ_OP_WRITE); - ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA); - if (ret) - return ret; - refcount_inc(&dip->refs); ret = btrfs_map_bio(fs_info, bio, mirror_num); if (ret) @@ -8207,11 +8200,8 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, if (async_submit) async_submit = !atomic_read(&BTRFS_I(inode)->sync_writers); - if (!write) { - ret = btrfs_bio_wq_end_io(fs_info, bio, BTRFS_WQ_ENDIO_DATA); - if (ret) - goto err; - } + if (!write) + btrfs_bio(bio)->endio_type = BTRFS_WQ_ENDIO_DATA; if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) goto map; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 1d40f4fa64a3..f7449bf7a595 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6720,10 +6720,31 @@ static void btrfs_end_bio(struct bio *bio) } } -static void submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio, - u64 physical, struct btrfs_device *dev) +/* + * Endio remaps which can't handle cloned bio needs to go here. + * + * Currently it's only btrfs_bio_wq_end_io(). + */ +static int btrfs_bio_final_endio_remap(struct btrfs_fs_info *fs_info, + struct bio *bio) +{ + blk_status_t sts; + + /* For write bio, we don't to put their endio into wq */ + if (btrfs_op(bio) == BTRFS_MAP_WRITE) + return 0; + + sts = btrfs_bio_wq_end_io(fs_info, bio, btrfs_bio(bio)->endio_type); + if (sts != BLK_STS_OK) + return blk_status_to_errno(sts); + return 0; +} + +static int submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio, + u64 physical, struct btrfs_device *dev) { struct btrfs_fs_info *fs_info = bioc->fs_info; + int ret; bio->bi_private = bioc; btrfs_bio(bio)->device = dev; @@ -6750,9 +6771,14 @@ static void submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio, dev->devid, bio->bi_iter.bi_size); bio_set_dev(bio, dev->bdev); - btrfs_bio_counter_inc_noblocked(fs_info); + /* Do the final endio remap if needed */ + ret = btrfs_bio_final_endio_remap(fs_info, bio); + if (ret < 0) + return ret; + btrfs_bio_counter_inc_noblocked(fs_info); btrfsic_submit_bio(bio); + return ret; } static void bioc_error(struct btrfs_io_context *bioc, struct bio *bio, u64 logical) @@ -6826,9 +6852,16 @@ static int submit_one_mapped_range(struct btrfs_fs_info *fs_info, struct bio *bi else bio = first_bio; - submit_stripe_bio(bioc, bio, bioc->stripes[dev_nr].physical, dev); + ret = submit_stripe_bio(bioc, bio, + bioc->stripes[dev_nr].physical, dev); + if (ret < 0) + goto error; } return 0; +error: + for (; dev_nr < total_devs; dev_nr++) + bioc_error(bioc, first_bio, logical); + return ret; } blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 3b8130680749..27d396c152c6 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -328,7 +328,14 @@ struct btrfs_fs_devices { * Mostly for btrfs specific features like csum and mirror_num. */ struct btrfs_bio { - unsigned int mirror_num; + u16 mirror_num; + + /* + * To tell which workqueue the bio's endio should be exeucted in. + * + * Only for read bios. + */ + u16 endio_type; /* @device is for stripe IO submission. */ struct btrfs_device *device; From patchwork Sun Nov 28 05:52:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12642781 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CCB8C433F5 for ; Sun, 28 Nov 2021 05:55:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233259AbhK1F6m (ORCPT ); Sun, 28 Nov 2021 00:58:42 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:35624 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232457AbhK1F4l (ORCPT ); Sun, 28 Nov 2021 00:56:41 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id B46C82170C; Sun, 28 Nov 2021 05:53:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638078804; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KkcruaSFxcOh//ihoAYc3lSaCn+RP506/WQ8KzxQix4=; b=SFx+ABJlegH9eStYLM+zwDSuce2lOhWdq8TW7J8xcyrKBBntamnU9o9fU0Qz8NhpRN4snv Oil52G9SDc5rEaV9lJGLX2XXdGRUV+q4aKbgky8EYE3PU0wMGCkgniAoG5UdPpxI/ZDHZz n6IvVirB7nslaidzpB2joGj7UOEN84E= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id BE53413446; Sun, 28 Nov 2021 05:53:23 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id qMwrI1MZo2G7fAAAMHmgww (envelope-from ); Sun, 28 Nov 2021 05:53:23 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH RFC 04/11] btrfs: introduce btrfs_bio_split() helper Date: Sun, 28 Nov 2021 13:52:52 +0800 Message-Id: <20211128055259.39249-5-wqu@suse.com> X-Mailer: git-send-email 2.34.0 In-Reply-To: <20211128055259.39249-1-wqu@suse.com> References: <20211128055259.39249-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This new function will handle the split of a btrfs bio, to co-operate with the incoming chunk mapping time bio split. This patch will introduce the following new members: - btrfs_bio::offset_to_original Since btrfs_bio::csum is still storing the checksum for the original logical bytenr, we need to know the offset between current advanced bio and the original logical bytenr. Thus here we need such new member. And the new member will fit into the existing hole between btrfs_bio::mirror_num and btrfs_bio::device, it should not increase the memory usage of btrfs_bio. - btrfs_bio::parent and btrfs_bio::orig_endio To record where the parent bio is and the original endio function. - btrfs_bio::is_split_bio To distinguish bio created by btrfs_bio_split() and btrfs_bio_clone*(). For cloned bio, they still have their csum pointed to correct memory, while split bio must rely on its parent bbio to grab csum pointer. - split_bio_endio() Just to call the original endio function then call bio_endio() on the original bio. This will ensure the original bio is freed after all cloned bio. Currently there is no other caller utilizing above new members/functions yet. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 76 +++++++++++++++++++++++++++++++++++++++++++- fs/btrfs/extent_io.h | 2 ++ fs/btrfs/volumes.h | 43 +++++++++++++++++++++++-- 3 files changed, 117 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index b289d26aca0d..34195891b0b5 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3005,7 +3005,6 @@ static void end_bio_extent_readpage(struct bio *bio) int ret; struct bvec_iter_all iter_all; - ASSERT(!bio_flagged(bio, BIO_CLONED)); bio_for_each_segment_all(bvec, bio, iter_all) { bool uptodate = !bio->bi_status; struct page *page = bvec->bv_page; @@ -3184,6 +3183,81 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size) return bio; } +/* + * A very simple wrapper to call original endio function and then + * call bio_endio() on the parent bio to decrease its bi_remaining count. + */ +static void split_bio_endio(struct bio *bio) +{ + struct btrfs_bio *bbio = btrfs_bio(bio); + /* After endio bbio could be freed, thus grab the info before endio */ + struct bio *parent = bbio->parent; + + /* + * BIO_CLONED can even be set for our parent bio (DIO use clones + * the initial bio, then uses the cloned one for IO). + * So here we don't check BIO_CLONED for parent. + */ + ASSERT(bio_flagged(bio, BIO_CLONED) && bbio->is_split_bio); + ASSERT(bbio->parent && !bbio->is_split_bio); + + bio->bi_end_io = bbio->orig_endio; + bio_endio(bio); + bio_endio(parent); +} + +/* + * Pretty much like bio_split(), caller needs to ensure @src is not freed + * before the newly allocated bio, as the new bio is relying on @src for + * its bvecs. + */ +struct bio *btrfs_bio_split(struct btrfs_fs_info *fs_info, + struct bio *src, unsigned int bytes) +{ + struct bio *new; + struct btrfs_bio *src_bbio = btrfs_bio(src); + struct btrfs_bio *new_bbio; + const unsigned int old_offset = src_bbio->offset_to_original; + + /* Src should not be split */ + ASSERT(!src_bbio->is_split_bio); + ASSERT(IS_ALIGNED(bytes, fs_info->sectorsize)); + ASSERT(bytes < src->bi_iter.bi_size); + + /* + * We're in fact chaining the new bio to the parent, but we still want + * to have independent bi_private/bi_endio, thus we need to manually + * increase the remaining for the source, just like bio_chain(). + */ + bio_inc_remaining(src); + + /* Bioset backed split should not fail */ + new = bio_split(src, bytes >> SECTOR_SHIFT, GFP_NOFS, &btrfs_bioset); + new_bbio = btrfs_bio(new); + new_bbio->offset_to_original = old_offset; + new_bbio->iter = new->bi_iter; + new_bbio->orig_endio = src->bi_end_io; + new_bbio->parent = src; + new_bbio->endio_type = src_bbio->endio_type; + new_bbio->is_split_bio = 1; + new->bi_end_io = split_bio_endio; + + /* + * This is very tricky, as if any endio has extra refcount on + * bi_private, we will be screwed up. + * + * We workaround this hacky behavior by reviewing all the involved + * endio stacks. Making sure only split-safe endio remap are called. + * + * Split-unsafe endio remap like btrfs_bio_wq_end_io() will be called + * after btrfs_bio_split(). + */ + new->bi_private = src->bi_private; + + src_bbio->offset_to_original += bytes; + return new; +} + /** * Attempt to add a page to bio * diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 0399cf8e3c32..cb727b77ecda 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -280,6 +280,8 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end, struct bio *btrfs_bio_alloc(unsigned int nr_iovecs); struct bio *btrfs_bio_clone(struct bio *bio); struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size); +struct bio *btrfs_bio_split(struct btrfs_fs_info *fs_info, + struct bio *src, unsigned int bytes); void end_extent_writepage(struct page *page, int err, u64 start, u64 end); int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 27d396c152c6..358fc546d611 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -332,15 +332,52 @@ struct btrfs_bio { /* * To tell which workqueue the bio's endio should be exeucted in. + * This member is to make sure btrfs_bio_wq_end_io() is the last + * endio remap in the stack. * * Only for read bios. */ - u16 endio_type; + u8 endio_type; + + /* + * To tell if this btrfs bio is split or just cloned. + * Both btrfs_bio_clone*() and btrfs_bio_split() will make bbio->bio + * to have BIO_CLONED flag. + * But cloned bio still has its bbio::csum pointed to correct memory, + * unlike split bio relies on its parent bbio to grab csum. + * + * Thus we needs this extra flag to distinguish those cloned bio. + */ + u8 is_split_bio; + + /* + * Records the offset we're from the original bio. + * + * Since btrfs_bio can be split, but our csum is alwasy for the + * original logical bytenr, we need a way to know the bytes offset + * from the original logical bytenr to do proper csum verification. + */ + unsigned int offset_to_original; /* @device is for stripe IO submission. */ struct btrfs_device *device; - u8 *csum; - u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE]; + + union { + /* + * For the parent bio recording the csum for the original + * logical bytenr + */ + struct { + u8 *csum; + u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE]; + }; + + /* For child (split) bio to record where its parent is */ + struct { + struct bio *parent; + bio_end_io_t *orig_endio; + }; + }; struct bvec_iter iter; /* From patchwork Sun Nov 28 05:52:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12642783 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6F8DC4332F for ; Sun, 28 Nov 2021 05:55:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233282AbhK1F6n (ORCPT ); Sun, 28 Nov 2021 00:58:43 -0500 Received: from smtp-out1.suse.de ([195.135.220.28]:35634 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232483AbhK1F4m (ORCPT ); Sun, 28 Nov 2021 00:56:42 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 0F0F12170E; Sun, 28 Nov 2021 05:53:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638078806; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BHixxRsRgehLMb9gkYwQrbF05g6kR7P9OlpTi4XwZuE=; b=XUcKXa4D5iEGuS+WOkphe8jQpZhOU+3n+lEG90mCE4sl4VQTaW4X2Goe2DMAS7sOJZqBIE bwx4PdUWU5tf/W09AJl4JgrEemod5fzYZUf5wvoS4YNhol3iqSOHwODxsqWk7lNbA2WeF6 aqUYMPYXgbE0jiBdXeHFsCNUEt9Q6ZI= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 1758C13446; Sun, 28 Nov 2021 05:53:24 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id sCo1NlQZo2G7fAAAMHmgww (envelope-from ); Sun, 28 Nov 2021 05:53:24 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH RFC 05/11] btrfs: save bio::bi_iter into btrfs_bio::iter before submitting Date: Sun, 28 Nov 2021 13:52:53 +0800 Message-Id: <20211128055259.39249-6-wqu@suse.com> X-Mailer: git-send-email 2.34.0 In-Reply-To: <20211128055259.39249-1-wqu@suse.com> References: <20211128055259.39249-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Since block layer will advance bio::bi_iter, at endio time we can no longer rely on bio::bi_iter for split bio. But for the incoming btrfs_bio split at btrfs_map_bio() time, we have to ensure endio function is only executed for the split range, not the whole original bio. Thus this patch will introduce a new helper, btrfs_bio_save_iter(), to save bi_iter into btrfs_bio::iter. The following call sites need this helper call: - btrfs_submit_compressed_read() For compressed read. For compressed write it doesn't really care as they use ordered extent. - raid56_parity_write() - raid56_parity_recovery() For RAID56. - submit_stripe_bio() For all other cases. Signed-off-by: Qu Wenruo --- fs/btrfs/compression.c | 3 +++ fs/btrfs/raid56.c | 2 ++ fs/btrfs/volumes.c | 11 +++++++++++ fs/btrfs/volumes.h | 19 +++++++++++++++++++ 4 files changed, 35 insertions(+) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 64f931fc11f0..943e5898fa87 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -867,6 +867,9 @@ blk_status_t btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, /* include any pages we added in add_ra-bio_pages */ cb->len = bio->bi_iter.bi_size; + /* Save bi_iter so that end_bio_extent_readpage() won't freak out. */ + btrfs_bio_save_iter(btrfs_bio(bio)); + while (cur_disk_byte < disk_bytenr + compressed_len) { u64 offset = cur_disk_byte - disk_bytenr; unsigned int index = offset >> PAGE_SHIFT; diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 0e239a4c3b26..13e726c88a81 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -1731,6 +1731,7 @@ int raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc, return PTR_ERR(rbio); } bio_list_add(&rbio->bio_list, bio); + btrfs_bio_save_iter(btrfs_bio(bio)); rbio->bio_list_bytes = bio->bi_iter.bi_size; rbio->operation = BTRFS_RBIO_WRITE; @@ -2135,6 +2136,7 @@ int raid56_parity_recover(struct bio *bio, struct btrfs_io_context *bioc, rbio->operation = BTRFS_RBIO_READ_REBUILD; bio_list_add(&rbio->bio_list, bio); + btrfs_bio_save_iter(btrfs_bio(bio)); rbio->bio_list_bytes = bio->bi_iter.bi_size; rbio->faila = find_logical_bio_stripe(rbio, bio); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f7449bf7a595..e6ed71195e18 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6771,6 +6771,17 @@ static int submit_stripe_bio(struct btrfs_io_context *bioc, struct bio *bio, dev->devid, bio->bi_iter.bi_size); bio_set_dev(bio, dev->bdev); + /* + * At endio time, bi_iter is no longer reliable, thus we have to save + * current bi_iter into btrfs_bio so that even for split bio we can + * iterate only the split part. + * + * For bio create by btrfs_bio_slit() or btrfs_bio_clone*(), it's + * already set, but we can still have original bio which has its + * iter not initialized. + */ + btrfs_bio_save_iter(btrfs_bio(bio)); + /* Do the final endio remap if needed */ ret = btrfs_bio_final_endio_remap(fs_info, bio); if (ret < 0) diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 358fc546d611..baccf895a544 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -378,6 +378,13 @@ struct btrfs_bio { bio_end_io_t *orig_endio; }; }; + + /* + * Saved bio::bi_iter before submission. + * + * This allows us to interate the cloned/split bio properly, as at + * endio time bio::bi_iter is no longer reliable. + */ struct bvec_iter iter; /* @@ -400,6 +407,18 @@ static inline void btrfs_bio_free_csum(struct btrfs_bio *bbio) } } +/* + * To save bbio::bio->bi_iter into bbio::iter so for callers who need the + * original bi_iter can access the original part of the bio. + * This is especially important for the incoming split btrfs_bio, which needs + * to call its endio for and only for the split range. + */ +static inline void btrfs_bio_save_iter(struct btrfs_bio *bbio) +{ + if (!bbio->iter.bi_size) + bbio->iter = bbio->bio.bi_iter; +} + struct btrfs_io_stripe { struct btrfs_device *dev; u64 physical; From patchwork Sun Nov 28 05:52:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12642785 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4AE2AC4321E for ; Sun, 28 Nov 2021 05:55:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233266AbhK1F6o (ORCPT ); Sun, 28 Nov 2021 00:58:44 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:45952 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232517AbhK1F4n (ORCPT ); Sun, 28 Nov 2021 00:56:43 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 5EDD51FD36; Sun, 28 Nov 2021 05:53:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638078807; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9OeTUH5YPlOoejJ4fN/vPvwJ6mLW4jcHaXUbjR6gNgY=; b=fJJanMZMTmwhQASz5qnU51HcO0/quzgrKGAAsoirnW2kZ1h0Ol72jVdL3N8OB55hbNF8Wk OUB9XEf2C1pQJy9ZzZyandd+liaDymxkgfc6bDDMH2r1CirhpvHQd8RPSR51g4uKLzOuMv 7PRW5TecbMWqd+u/0TESy6mFXj/JUY8= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 6693413446; Sun, 28 Nov 2021 05:53:26 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id WHPHDVYZo2G7fAAAMHmgww (envelope-from ); Sun, 28 Nov 2021 05:53:26 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH RFC 06/11] btrfs: make end_bio_extent_readpage() to handle split bio properly Date: Sun, 28 Nov 2021 13:52:54 +0800 Message-Id: <20211128055259.39249-7-wqu@suse.com> X-Mailer: git-send-email 2.34.0 In-Reply-To: <20211128055259.39249-1-wqu@suse.com> References: <20211128055259.39249-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org This involves the following modifications: - Use bio_for_each_segment() instead of bio_for_each_segment_all() bio_for_each_segment_all() will iterate all bvecs, even if they are not referred by current bi_iter. *_all() variant can only be used if the bio is never split. Change it to bio_for_each_segment() call so we won't have endio called on the same range by both split and parent bios. - Make check_data_csum() to take bbio->offset_to_original into consideration Since btrfs bio can be split now, split/original bio can all start with some offset to the original logical bytenr. Take btrfs_bio::offset_to_original into consideration to get correct checksum offset. - Remove the BIO_CLONED ASSERT() in submit_read_repair() For metadata path, there is no change as they only rely on file offset, doesn't care about btrfs_bio at all. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 34 +++++++++++++++++++--------------- fs/btrfs/inode.c | 23 +++++++++++++++++++++-- fs/btrfs/volumes.h | 3 ++- 3 files changed, 42 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 34195891b0b5..67965faeef47 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2735,10 +2735,9 @@ static blk_status_t submit_read_repair(struct inode *inode, ASSERT(error_bitmap); /* - * We only get called on buffered IO, thus page must be mapped and bio - * must not be cloned. - */ - ASSERT(page->mapping && !bio_flagged(failed_bio, BIO_CLONED)); + * We only get called on buffered IO, thus page must be mapped + */ + ASSERT(page->mapping); /* Iterate through all the sectors in the range */ for (i = 0; i < nr_bits; i++) { @@ -2992,7 +2991,8 @@ static struct extent_buffer *find_extent_buffer_readpage( */ static void end_bio_extent_readpage(struct bio *bio) { - struct bio_vec *bvec; + struct bio_vec bvec; + struct bvec_iter iter; struct btrfs_bio *bbio = btrfs_bio(bio); struct extent_io_tree *tree, *failure_tree; struct processed_extent processed = { 0 }; @@ -3003,11 +3003,15 @@ static void end_bio_extent_readpage(struct bio *bio) u32 bio_offset = 0; int mirror; int ret; - struct bvec_iter_all iter_all; - bio_for_each_segment_all(bvec, bio, iter_all) { + /* + * We should have saved the orignal bi_iter, and then start iterating + * using that saved iter, as at endio time bi_iter is not reliable. + */ + ASSERT(bbio->iter.bi_size); + __bio_for_each_segment(bvec, bio, iter, bbio->iter) { bool uptodate = !bio->bi_status; - struct page *page = bvec->bv_page; + struct page *page = bvec.bv_page; struct inode *inode = page->mapping->host; struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); const u32 sectorsize = fs_info->sectorsize; @@ -3030,19 +3034,19 @@ static void end_bio_extent_readpage(struct bio *bio) * for unaligned offsets, and an error if they don't add up to * a full sector. */ - if (!IS_ALIGNED(bvec->bv_offset, sectorsize)) + if (!IS_ALIGNED(bvec.bv_offset, sectorsize)) btrfs_err(fs_info, "partial page read in btrfs with offset %u and length %u", - bvec->bv_offset, bvec->bv_len); - else if (!IS_ALIGNED(bvec->bv_offset + bvec->bv_len, + bvec.bv_offset, bvec.bv_len); + else if (!IS_ALIGNED(bvec.bv_offset + bvec.bv_len, sectorsize)) btrfs_info(fs_info, "incomplete page read with offset %u and length %u", - bvec->bv_offset, bvec->bv_len); + bvec.bv_offset, bvec.bv_len); - start = page_offset(page) + bvec->bv_offset; - end = start + bvec->bv_len - 1; - len = bvec->bv_len; + start = page_offset(page) + bvec.bv_offset; + end = start + bvec.bv_len - 1; + len = bvec.bv_len; mirror = bbio->mirror_num; if (likely(uptodate)) { diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 8e6901aeeb89..1bf56c2b4bd9 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3220,6 +3220,24 @@ void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode, finish_ordered_fn, uptodate); } +static u8 *bbio_get_real_csum(struct btrfs_fs_info *fs_info, + struct btrfs_bio *bbio) +{ + u8 *ret; + + /* Split bbio needs to grab csum from its parent */ + if (bbio->is_split_bio) + ret = btrfs_bio(bbio->parent)->csum; + else + ret = bbio->csum; + + if (ret == NULL) + return ret; + + return ret + (bbio->offset_to_original >> fs_info->sectorsize_bits) * + fs_info->csum_size; +} + /* * check_data_csum - verify checksum of one sector of uncompressed data * @inode: inode @@ -3247,7 +3265,8 @@ static int check_data_csum(struct inode *inode, struct btrfs_bio *bbio, ASSERT(pgoff + len <= PAGE_SIZE); offset_sectors = bio_offset >> fs_info->sectorsize_bits; - csum_expected = ((u8 *)bbio->csum) + offset_sectors * csum_size; + csum_expected = bbio_get_real_csum(fs_info, bbio) + + offset_sectors * csum_size; kaddr = kmap_atomic(page); shash->tfm = fs_info->csum_shash; @@ -3305,7 +3324,7 @@ unsigned int btrfs_verify_data_csum(struct btrfs_bio *bbio, * Normally this should be covered by above check for compressed read * or the next check for NODATASUM. Just do a quicker exit here. */ - if (bbio->csum == NULL) + if (bbio_get_real_csum(fs_info, bbio) == NULL) return 0; if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index baccf895a544..38028926c679 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -401,7 +401,8 @@ static inline struct btrfs_bio *btrfs_bio(struct bio *bio) static inline void btrfs_bio_free_csum(struct btrfs_bio *bbio) { - if (bbio->csum != bbio->csum_inline) { + /* Only free the csum if we're not a split bio */ + if (!bbio->is_split_bio && bbio->csum != bbio->csum_inline) { kfree(bbio->csum); bbio->csum = NULL; } From patchwork Sun Nov 28 05:52:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12642787 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81EDFC433F5 for ; Sun, 28 Nov 2021 05:55:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233339AbhK1F6p (ORCPT ); Sun, 28 Nov 2021 00:58:45 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:45960 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232548AbhK1F4p (ORCPT ); Sun, 28 Nov 2021 00:56:45 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id ADEF41FD37; Sun, 28 Nov 2021 05:53:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638078808; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YVt9OfxBECVG2p8rz9FvunW4H+fqBQcwg/n9PNHZfGs=; b=ObSMgp5mGa+qigS7+UKjidQug5LCgmTAiezt04Bvy4nFJ9ulyBSfBOqDOB1zW8qh6BamOh 6YeVrS3mWIXF6qEQnRCZ+7v3UrYvzIkw0n46m8fxPkt7TE7YZn8ITFAqZY7uiCmMCJ+fs1 6hGbW3j4K2gBKuJZx0xx7IEuHGTYIyA= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B5F3313446; Sun, 28 Nov 2021 05:53:27 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id eGhJIVcZo2G7fAAAMHmgww (envelope-from ); Sun, 28 Nov 2021 05:53:27 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH RFC 07/11] btrfs: make end_bio_extent_*_writepage() to handle split biot properly Date: Sun, 28 Nov 2021 13:52:55 +0800 Message-Id: <20211128055259.39249-8-wqu@suse.com> X-Mailer: git-send-email 2.34.0 In-Reply-To: <20211128055259.39249-1-wqu@suse.com> References: <20211128055259.39249-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org There are 3 call sites involved: - end_bio_extent_writepage() For data writeback - end_bio_subpage_eb_writepage() For subpage metadata writeback - end_bio_extent_buffer_writepage() For regular metadata writeback All those functions share the same modification: - Remove ASSERT() to non-cloned bios - Use bio_for_each_segment() Which can handle both unsplit bio and split biot properly. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 50 ++++++++++++++++++++++---------------------- 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 67965faeef47..91013c1dce25 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2827,31 +2827,31 @@ void end_extent_writepage(struct page *page, int err, u64 start, u64 end) static void end_bio_extent_writepage(struct bio *bio) { int error = blk_status_to_errno(bio->bi_status); - struct bio_vec *bvec; + struct bvec_iter iter; + struct bio_vec bvec; u64 start; u64 end; - struct bvec_iter_all iter_all; bool first_bvec = true; - ASSERT(!bio_flagged(bio, BIO_CLONED)); - bio_for_each_segment_all(bvec, bio, iter_all) { - struct page *page = bvec->bv_page; + ASSERT(btrfs_bio(bio)->iter.bi_size); + __bio_for_each_segment(bvec, bio, iter, btrfs_bio(bio)->iter) { + struct page *page = bvec.bv_page; struct inode *inode = page->mapping->host; struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); const u32 sectorsize = fs_info->sectorsize; /* Our read/write should always be sector aligned. */ - if (!IS_ALIGNED(bvec->bv_offset, sectorsize)) + if (!IS_ALIGNED(bvec.bv_offset, sectorsize)) btrfs_err(fs_info, "partial page write in btrfs with offset %u and length %u", - bvec->bv_offset, bvec->bv_len); - else if (!IS_ALIGNED(bvec->bv_len, sectorsize)) + bvec.bv_offset, bvec.bv_len); + else if (!IS_ALIGNED(bvec.bv_len, sectorsize)) btrfs_info(fs_info, "incomplete page write with offset %u and length %u", - bvec->bv_offset, bvec->bv_len); + bvec.bv_offset, bvec.bv_len); - start = page_offset(page) + bvec->bv_offset; - end = start + bvec->bv_len - 1; + start = page_offset(page) + bvec.bv_offset; + end = start + bvec.bv_len - 1; if (first_bvec) { btrfs_record_physical_zoned(inode, start, bio); @@ -2860,7 +2860,7 @@ static void end_bio_extent_writepage(struct bio *bio) end_extent_writepage(page, error, start, end); - btrfs_page_clear_writeback(fs_info, page, start, bvec->bv_len); + btrfs_page_clear_writeback(fs_info, page, start, bvec.bv_len); } bio_put(bio); @@ -4475,20 +4475,20 @@ static struct extent_buffer *find_extent_buffer_nolock( static void end_bio_subpage_eb_writepage(struct bio *bio) { struct btrfs_fs_info *fs_info; - struct bio_vec *bvec; - struct bvec_iter_all iter_all; + struct bvec_iter iter; + struct bio_vec bvec; fs_info = btrfs_sb(bio_first_page_all(bio)->mapping->host->i_sb); ASSERT(fs_info->sectorsize < PAGE_SIZE); - ASSERT(!bio_flagged(bio, BIO_CLONED)); - bio_for_each_segment_all(bvec, bio, iter_all) { - struct page *page = bvec->bv_page; - u64 bvec_start = page_offset(page) + bvec->bv_offset; - u64 bvec_end = bvec_start + bvec->bv_len - 1; + ASSERT(btrfs_bio(bio)->iter.bi_size); + __bio_for_each_segment(bvec, bio, iter, btrfs_bio(bio)->iter) { + struct page *page = bvec.bv_page; + u64 bvec_start = page_offset(page) + bvec.bv_offset; + u64 bvec_end = bvec_start + bvec.bv_len - 1; u64 cur_bytenr = bvec_start; - ASSERT(IS_ALIGNED(bvec->bv_len, fs_info->nodesize)); + ASSERT(IS_ALIGNED(bvec.bv_len, fs_info->nodesize)); /* Iterate through all extent buffers in the range */ while (cur_bytenr <= bvec_end) { @@ -4531,14 +4531,14 @@ static void end_bio_subpage_eb_writepage(struct bio *bio) static void end_bio_extent_buffer_writepage(struct bio *bio) { - struct bio_vec *bvec; + struct bvec_iter iter; + struct bio_vec bvec; struct extent_buffer *eb; int done; - struct bvec_iter_all iter_all; - ASSERT(!bio_flagged(bio, BIO_CLONED)); - bio_for_each_segment_all(bvec, bio, iter_all) { - struct page *page = bvec->bv_page; + ASSERT(btrfs_bio(bio)->iter.bi_size); + __bio_for_each_segment(bvec, bio, iter, btrfs_bio(bio)->iter) { + struct page *page = bvec.bv_page; eb = (struct extent_buffer *)page->private; BUG_ON(!eb); From patchwork Sun Nov 28 05:52:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12642789 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12BCBC433FE for ; Sun, 28 Nov 2021 05:55:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233365AbhK1F6q (ORCPT ); Sun, 28 Nov 2021 00:58:46 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:45974 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232614AbhK1F4q (ORCPT ); Sun, 28 Nov 2021 00:56:46 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 092911FD41; Sun, 28 Nov 2021 05:53:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638078810; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rLOaZQzvrbLdTILy3pSJ8sHXuXBuGAn0ZnLbYvMbmfE=; b=iWgs6G+3ghQBwaauQmt631mW4V9+sk5viaYOH7ifuaAmsiBFhEUnj317PXRutEe0H8g5u7 /0+v5mgxKLQwp0cQj/xisfEVzExnQTF1tnFW+dCqJ032TSzle28kfPiuY0tpvnf0m1u7vw FE4KOOnTndfbCyxHwwip6tckCegtRaw= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 100DB13446; Sun, 28 Nov 2021 05:53:28 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 6BRRNFgZo2G7fAAAMHmgww (envelope-from ); Sun, 28 Nov 2021 05:53:28 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH RFC 08/11] btrfs: allow btrfs_map_bio() to split bio according to chunk stripe boundaries Date: Sun, 28 Nov 2021 13:52:56 +0800 Message-Id: <20211128055259.39249-9-wqu@suse.com> X-Mailer: git-send-email 2.34.0 In-Reply-To: <20211128055259.39249-1-wqu@suse.com> References: <20211128055259.39249-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org With the new btrfs_bio_split() helper, we are able to split bio according to chunk stripe boundaries at btrfs_map_bio() time. Although currently due bio split at submit_extent_page() this ability is not yet utilized. Signed-off-by: Qu Wenruo --- fs/btrfs/volumes.c | 53 +++++++++++++++++++++++++++++++--------------- 1 file changed, 36 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index e6ed71195e18..7f7e13e4caa3 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6878,29 +6878,48 @@ static int submit_one_mapped_range(struct btrfs_fs_info *fs_info, struct bio *bi blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror_num) { - u64 logical = bio->bi_iter.bi_sector << 9; - u64 length = 0; - u64 map_length; + const u64 orig_logical = bio->bi_iter.bi_sector << SECTOR_SHIFT; + const unsigned int orig_length = bio->bi_iter.bi_size; + const enum btrfs_map_op op = btrfs_op(bio); + u64 cur_logical = orig_logical; int ret; - struct btrfs_io_context *bioc = NULL; - length = bio->bi_iter.bi_size; - map_length = length; + while (cur_logical < orig_logical + orig_length) { + u64 map_length = orig_logical + orig_length - cur_logical; + struct btrfs_io_context *bioc = NULL; + struct bio *cur_bio; - btrfs_bio_counter_inc_blocked(fs_info); - ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical, - &map_length, &bioc, mirror_num, 1); - if (ret) { - btrfs_bio_counter_dec(fs_info); - return errno_to_blk_status(ret); - } + ret = __btrfs_map_block(fs_info, op, cur_logical, &map_length, + &bioc, mirror_num, 1); + if (ret) + return errno_to_blk_status(ret); - ret = submit_one_mapped_range(fs_info, bio, bioc, map_length, mirror_num); - if (ret < 0) { + if (cur_logical + map_length < orig_logical + orig_length) { + /* + * For now zoned write should never cross stripe + * boundary + */ + ASSERT(bio_op(bio) != REQ_OP_ZONE_APPEND); + + /* Need to split */ + cur_bio = btrfs_bio_split(fs_info, bio, map_length); + if (IS_ERR(cur_bio)) { + btrfs_put_bioc(bioc); + ret = PTR_ERR(cur_bio); + return errno_to_blk_status(ret); + } + } else { + /* Can use the existing bio directly */ + cur_bio = bio; + } + btrfs_bio_counter_inc_blocked(fs_info); + ret = submit_one_mapped_range(fs_info, cur_bio, bioc, + map_length, mirror_num); btrfs_bio_counter_dec(fs_info); - return errno_to_blk_status(ret); + if (ret < 0) + return errno_to_blk_status(ret); + cur_logical += map_length; } - btrfs_bio_counter_dec(fs_info); return BLK_STS_OK; } From patchwork Sun Nov 28 05:52:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12642791 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 071BCC43217 for ; Sun, 28 Nov 2021 05:55:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233379AbhK1F6s (ORCPT ); Sun, 28 Nov 2021 00:58:48 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:45984 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231383AbhK1F4r (ORCPT ); Sun, 28 Nov 2021 00:56:47 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 54FF61FD45; Sun, 28 Nov 2021 05:53:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638078811; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Hb8NPmLXUMyExLegY106tgMVQ5lBVGfGfpKn8sWn6Dg=; b=B94DEOVOnwR7DaAclG8BlMzOYOhVTblNd3tH9dTx8OJGcZmeTV8FOFdLI+Dlqw/7pE2kkM /KjQG5DXoG7S94tPsf/dmkF7msB4TlbBVCgSkmbYK9DtRNIaaRLRI5vnJ+/w+Jl+MljLH4 pSizTfCGRt3jhvtN4+FBSinm/sJLG30= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 5F2DA13446; Sun, 28 Nov 2021 05:53:30 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id ONQeDFoZo2G7fAAAMHmgww (envelope-from ); Sun, 28 Nov 2021 05:53:30 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH RFC 09/11] btrfs: remove bio split operations in btrfs_submit_direct() Date: Sun, 28 Nov 2021 13:52:57 +0800 Message-Id: <20211128055259.39249-10-wqu@suse.com> X-Mailer: git-send-email 2.34.0 In-Reply-To: <20211128055259.39249-1-wqu@suse.com> References: <20211128055259.39249-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Since btrfs_map_bio() will handle the split, there is no need to do the split in btrfs_submit_direct() anymore. Signed-off-by: Qu Wenruo --- fs/btrfs/inode.c | 116 +++++++++-------------------------------------- 1 file changed, 22 insertions(+), 94 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 1bf56c2b4bd9..24c8bb6d8543 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8202,22 +8202,16 @@ static void btrfs_end_dio_bio(struct bio *bio) } static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, - struct inode *inode, u64 file_offset, int async_submit) + struct inode *inode, u64 file_offset) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct btrfs_dio_private *dip = bio->bi_private; bool write = btrfs_op(bio) == BTRFS_MAP_WRITE; + bool async_submit; blk_status_t ret; - /* - * Check btrfs_submit_data_bio() for rules about async submit. - * - * The only exception is for RAID56, when there are more than one bios - * to submit, async submit seems to make it harder to collect csums - * for the full stripe. - */ - if (async_submit) - async_submit = !atomic_read(&BTRFS_I(inode)->sync_writers); + /* Check btrfs_submit_data_bio() for rules about async submit. */ + async_submit = !atomic_read(&BTRFS_I(inode)->sync_writers); if (!write) btrfs_bio(bio)->endio_type = BTRFS_WQ_ENDIO_DATA; @@ -8291,20 +8285,9 @@ static void btrfs_submit_direct(const struct iomap_iter *iter, { struct inode *inode = iter->inode; const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE); - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - const bool raid56 = (btrfs_data_alloc_profile(fs_info) & - BTRFS_BLOCK_GROUP_RAID56_MASK); struct btrfs_dio_private *dip; struct bio *bio; - u64 start_sector; - int async_submit = 0; - u64 submit_len; - u64 clone_offset = 0; - u64 clone_len; - u64 logical; - int ret; blk_status_t status; - struct btrfs_io_geometry geom; struct btrfs_dio_data *dio_data = iter->iomap.private; struct extent_map *em = NULL; @@ -8331,84 +8314,29 @@ static void btrfs_submit_direct(const struct iomap_iter *iter, goto out_err; } - start_sector = dio_bio->bi_iter.bi_sector; - submit_len = dio_bio->bi_iter.bi_size; - - do { - logical = start_sector << 9; - em = btrfs_get_chunk_map(fs_info, logical, submit_len); - if (IS_ERR(em)) { - status = errno_to_blk_status(PTR_ERR(em)); - em = NULL; - goto out_err_em; - } - ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(dio_bio), - logical, &geom); - if (ret) { - status = errno_to_blk_status(ret); - goto out_err_em; - } - - clone_len = min(submit_len, geom.len); - ASSERT(clone_len <= UINT_MAX); - - /* - * This will never fail as it's passing GPF_NOFS and - * the allocation is backed by btrfs_bioset. - */ - bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len); - bio->bi_private = dip; - bio->bi_end_io = btrfs_end_dio_bio; - - if (bio_op(bio) == REQ_OP_ZONE_APPEND) { - status = extract_ordered_extent(BTRFS_I(inode), bio, - file_offset); - if (status) { - bio_put(bio); - goto out_err; - } - } - - ASSERT(submit_len >= clone_len); - submit_len -= clone_len; - - /* - * Increase the count before we submit the bio so we know - * the end IO handler won't happen before we increase the - * count. Otherwise, the dip might get freed before we're - * done setting it up. - * - * We transfer the initial reference to the last bio, so we - * don't need to increment the reference count for the last one. - */ - if (submit_len > 0) { - refcount_inc(&dip->refs); - /* - * If we are submitting more than one bio, submit them - * all asynchronously. The exception is RAID 5 or 6, as - * asynchronous checksums make it difficult to collect - * full stripe writes. - */ - if (!raid56) - async_submit = 1; - } + /* + * This will never fail as it's passing GPF_NOFS and + * the allocation is backed by btrfs_bioset. + */ + bio = btrfs_bio_clone(dio_bio); + bio->bi_private = dip; + bio->bi_end_io = btrfs_end_dio_bio; - status = btrfs_submit_dio_bio(bio, inode, file_offset, - async_submit); + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + status = extract_ordered_extent(BTRFS_I(inode), bio, + file_offset); if (status) { bio_put(bio); - if (submit_len > 0) - refcount_dec(&dip->refs); - goto out_err_em; + goto out_err; } + } - dio_data->submitted += clone_len; - clone_offset += clone_len; - start_sector += clone_len >> 9; - file_offset += clone_len; - - free_extent_map(em); - } while (submit_len > 0); + status = btrfs_submit_dio_bio(bio, inode, file_offset); + if (status) { + bio_put(bio); + goto out_err_em; + } + dio_data->submitted += dio_bio->bi_iter.bi_size; return; out_err_em: From patchwork Sun Nov 28 05:52:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12642795 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83FA0C433EF for ; Sun, 28 Nov 2021 05:57:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232851AbhK1GAn (ORCPT ); Sun, 28 Nov 2021 01:00:43 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:46046 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232766AbhK1F6m (ORCPT ); Sun, 28 Nov 2021 00:58:42 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id A2DEF1FCA1; Sun, 28 Nov 2021 05:53:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638078812; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HLucbzsCxyRVu0nLz48aeNMN8EZoET87ZEzYBa+Em3A=; b=ubOl6sspNw4Tqq+6q9ujI2DpJ2OU/HgSZ+45jF5agOrB33/xe/VVTS2qshY75Tvy4tivMz Crl8nw4LHviw7YQ5VaOurEFtYUwY/sHSb4T8rBd4ClZ/RCNKpZdag9SmGeDoTkKGetyky1 QOMGxbyYL48Cod8S31v/HGE53/L9IPg= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id ABE8D13446; Sun, 28 Nov 2021 05:53:31 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id EHW2HlsZo2G7fAAAMHmgww (envelope-from ); Sun, 28 Nov 2021 05:53:31 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH RFC 10/11] btrfs: remove btrfs_bio_ctrl::len_to_stripe_boundary Date: Sun, 28 Nov 2021 13:52:58 +0800 Message-Id: <20211128055259.39249-11-wqu@suse.com> X-Mailer: git-send-email 2.34.0 In-Reply-To: <20211128055259.39249-1-wqu@suse.com> References: <20211128055259.39249-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Since we can split bio at btrfs_map_bio() time, there is no need to do bio split for stripe boundaries at submit_extent_page() time. Signed-off-by: Qu Wenruo --- fs/btrfs/extent_io.c | 23 ++--------------------- fs/btrfs/extent_io.h | 1 - 2 files changed, 2 insertions(+), 22 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 91013c1dce25..9c845b2c50f8 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3295,7 +3295,7 @@ static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl, ASSERT(bio); /* The limit should be calculated when bio_ctrl->bio is allocated */ - ASSERT(bio_ctrl->len_to_oe_boundary && bio_ctrl->len_to_stripe_boundary); + ASSERT(bio_ctrl->len_to_oe_boundary); if (bio_ctrl->bio_flags != bio_flags) return 0; @@ -3306,9 +3306,7 @@ static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl, if (!contig) return 0; - real_size = min(bio_ctrl->len_to_oe_boundary, - bio_ctrl->len_to_stripe_boundary) - bio_size; - real_size = min(real_size, size); + real_size = min(bio_ctrl->len_to_oe_boundary - bio_size, size); /* * If real_size is 0, never call bio_add_*_page(), as even size is 0, @@ -3329,11 +3327,8 @@ static int calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl, struct btrfs_inode *inode, u64 file_offset) { struct btrfs_fs_info *fs_info = inode->root->fs_info; - struct btrfs_io_geometry geom; struct btrfs_ordered_extent *ordered; - struct extent_map *em; u64 logical = (bio_ctrl->bio->bi_iter.bi_sector << SECTOR_SHIFT); - int ret; /* * Pages for compressed extent are never submitted to disk directly, @@ -3344,22 +3339,8 @@ static int calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl, */ if (bio_ctrl->bio_flags & EXTENT_BIO_COMPRESSED) { bio_ctrl->len_to_oe_boundary = U32_MAX; - bio_ctrl->len_to_stripe_boundary = U32_MAX; return 0; } - em = btrfs_get_chunk_map(fs_info, logical, fs_info->sectorsize); - if (IS_ERR(em)) - return PTR_ERR(em); - ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(bio_ctrl->bio), - logical, &geom); - free_extent_map(em); - if (ret < 0) { - return ret; - } - if (geom.len > U32_MAX) - bio_ctrl->len_to_stripe_boundary = U32_MAX; - else - bio_ctrl->len_to_stripe_boundary = (u32)geom.len; if (!btrfs_is_zoned(fs_info) || bio_op(bio_ctrl->bio) != REQ_OP_ZONE_APPEND) { diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index cb727b77ecda..b4897597b445 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -109,7 +109,6 @@ struct extent_buffer { struct btrfs_bio_ctrl { struct bio *bio; unsigned long bio_flags; - u32 len_to_stripe_boundary; u32 len_to_oe_boundary; }; From patchwork Sun Nov 28 05:52:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 12642793 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A227C433EF for ; Sun, 28 Nov 2021 05:57:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232377AbhK1GAm (ORCPT ); Sun, 28 Nov 2021 01:00:42 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:46048 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232851AbhK1F6m (ORCPT ); Sun, 28 Nov 2021 00:58:42 -0500 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id F10021FD46; Sun, 28 Nov 2021 05:53:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1638078813; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vOZb81iwrYdiJvqJLEyz8wCyfDYl486gCXA8Z3dGJQA=; b=lQCzofwuiOr3/4/hUMAIdsnlozUxD1HHt97WxwDTNcG8IqCj2jIIjYueXtiOTtKDzUnXOs QEaOFZcXBXRKT0GQayNX8oyCrfYD8GmzjqdI2zc75SwL8a4MUNFLGqZfzMI/9VCJAJ0Mgj xfHEiTq7rlSjw8JR8GPsDLP7lGZf+QQ= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 0564713446; Sun, 28 Nov 2021 05:53:32 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id WFHFMVwZo2G7fAAAMHmgww (envelope-from ); Sun, 28 Nov 2021 05:53:32 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: linux-block@vger.kernel.org, dm-devel@redhat.com Subject: [PATCH RFC 11/11] btrfs: temporarily disable RAID56 Date: Sun, 28 Nov 2021 13:52:59 +0800 Message-Id: <20211128055259.39249-12-wqu@suse.com> X-Mailer: git-send-email 2.34.0 In-Reply-To: <20211128055259.39249-1-wqu@suse.com> References: <20211128055259.39249-1-wqu@suse.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org !!! DON'T MERGE THIS COMMIT !!! There are still some bugs buried deeply inside RAID56 code which is not yet compatible with bio split at btrfs_map_bio() time, disable it for now. Signed-off-by: Qu Wenruo --- fs/btrfs/ctree.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index f1dd2486dcb3..aa0b92312f43 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -295,6 +295,10 @@ static_assert(sizeof(struct btrfs_super_block) == BTRFS_SUPER_INFO_SIZE); #define BTRFS_FEATURE_COMPAT_RO_SAFE_SET 0ULL #define BTRFS_FEATURE_COMPAT_RO_SAFE_CLEAR 0ULL +/* + * Temprorayly remove RAID56 from support list due to conflicts with bio + * split at btrfs_map_bio() time. + */ #define BTRFS_FEATURE_INCOMPAT_SUPP \ (BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF | \ BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL | \ @@ -302,7 +306,6 @@ static_assert(sizeof(struct btrfs_super_block) == BTRFS_SUPER_INFO_SIZE); BTRFS_FEATURE_INCOMPAT_BIG_METADATA | \ BTRFS_FEATURE_INCOMPAT_COMPRESS_LZO | \ BTRFS_FEATURE_INCOMPAT_COMPRESS_ZSTD | \ - BTRFS_FEATURE_INCOMPAT_RAID56 | \ BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF | \ BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA | \ BTRFS_FEATURE_INCOMPAT_NO_HOLES | \