From patchwork Fri Feb 17 05:36:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13144330 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 037A1C636D4 for ; Fri, 17 Feb 2023 05:37:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229606AbjBQFhj (ORCPT ); Fri, 17 Feb 2023 00:37:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43946 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229557AbjBQFhi (ORCPT ); Fri, 17 Feb 2023 00:37:38 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DDA1A5CF16 for ; Thu, 16 Feb 2023 21:37:24 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 761E033AC6; Fri, 17 Feb 2023 05:37:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1676612243; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BNr+bCGe4lQc97udUmmRHHyNvqbdejw+G9BxC3K6TvI=; b=tj8CUWs8zi2dK/8F58RqNAchqrOS3PdE14Eq4ZpxxgLIYkUBvlcAUaSmJryZh/qTbPF6g2 /jaQlRdoUKV6QYkDjaVpbv44VdqxY72V795PqC3bItQnIpCK2cQ+/ClUVfFuFiPfxd5gir l8fiipHitj8IeCxZFt/v3Sk5zFzV4GE= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 611451323E; Fri, 17 Feb 2023 05:37:22 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id kC5ZC5IS72PTSQAAMHmgww (envelope-from ); Fri, 17 Feb 2023 05:37:22 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Johannes Thumshirn , Christoph Hellwig Subject: [PATCH 1/6] btrfs: remove map_lookup->stripe_len Date: Fri, 17 Feb 2023 13:36:58 +0800 Message-Id: <35172b54c4276a7ff4bb093156d09e4cb7a9a160.1676611535.git.wqu@suse.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently btrfs doesn't support stripe lengths other than 64KiB. This is already in the tree-checker. Thus there is really no meaning to record that fixed value in map_lookup, and can all be replaced with BTRFS_STRIPE_LEN. Furthermore we can use the fix stripe length to do the following optimization: - Use BTRFS_STRIPE_LEN_SHIFT to replace some 64bit division Now we only need to do a right shift. And the value of BTRFS_STRIPE_LEN itself is already too large for bit shift, thus if we accidentally use BTRFS_STRIPE_LEN to do bit shift, a compiler warning would be triggered. Thus this bit shift optimization would be very safe. - Use BTRFS_STRIPE_LEN_MASK to calculate the offset inside a stripe Reviewed-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig Signed-off-by: Qu Wenruo --- fs/btrfs/block-group.c | 10 +++-- fs/btrfs/scrub.c | 43 ++++++++++---------- fs/btrfs/tests/extent-map-tests.c | 1 - fs/btrfs/volumes.c | 65 ++++++++++++++++--------------- fs/btrfs/volumes.h | 4 +- 5 files changed, 64 insertions(+), 59 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 5b10401d803b..ac22c55be440 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1992,12 +1992,12 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, map = em->map_lookup; data_stripe_length = em->orig_block_len; - io_stripe_size = map->stripe_len; + io_stripe_size = BTRFS_STRIPE_LEN; chunk_start = em->start; /* For RAID5/6 adjust to a full IO stripe length */ if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) - io_stripe_size = map->stripe_len * nr_data_stripes(map); + io_stripe_size = nr_data_stripes(map) << BTRFS_STRIPE_LEN_SHIFT; buf = kcalloc(map->num_stripes, sizeof(u64), GFP_NOFS); if (!buf) { @@ -2015,8 +2015,10 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, data_stripe_length)) continue; - stripe_nr = physical - map->stripes[i].physical; - stripe_nr = div64_u64_rem(stripe_nr, map->stripe_len, &offset); + stripe_nr = (physical - map->stripes[i].physical) >> + BTRFS_STRIPE_LEN_SHIFT; + offset = (physical - map->stripes[i].physical) & + BTRFS_STRIPE_LEN_MASK; if (map->type & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID10)) { diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 69c93ae333f6..190536b90779 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2722,7 +2722,7 @@ static int scrub_extent(struct scrub_ctx *sctx, struct map_lookup *map, if (flags & BTRFS_EXTENT_FLAG_DATA) { if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) - blocksize = map->stripe_len; + blocksize = BTRFS_STRIPE_LEN; else blocksize = sctx->fs_info->sectorsize; spin_lock(&sctx->stat_lock); @@ -2731,7 +2731,7 @@ static int scrub_extent(struct scrub_ctx *sctx, struct map_lookup *map, spin_unlock(&sctx->stat_lock); } else if (flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) { if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) - blocksize = map->stripe_len; + blocksize = BTRFS_STRIPE_LEN; else blocksize = sctx->fs_info->nodesize; spin_lock(&sctx->stat_lock); @@ -2920,9 +2920,9 @@ static int get_raid56_logic_offset(u64 physical, int num, *offset = last_offset; for (i = 0; i < data_stripes; i++) { - *offset = last_offset + i * map->stripe_len; + *offset = last_offset + (i << BTRFS_STRIPE_LEN_SHIFT); - stripe_nr = div64_u64(*offset, map->stripe_len); + stripe_nr = *offset >> BTRFS_STRIPE_LEN_SHIFT; stripe_nr = div_u64(stripe_nr, data_stripes); /* Work out the disk rotation on this stripe-set */ @@ -2935,7 +2935,7 @@ static int get_raid56_logic_offset(u64 physical, int num, if (stripe_index < num) j++; } - *offset = last_offset + j * map->stripe_len; + *offset = last_offset + (j << BTRFS_STRIPE_LEN_SHIFT); return 1; } @@ -3205,7 +3205,7 @@ static int scrub_raid56_data_stripe_for_parity(struct scrub_ctx *sctx, /* Path must not be populated */ ASSERT(!path->nodes[0]); - while (cur_logical < logical + map->stripe_len) { + while (cur_logical < logical + BTRFS_STRIPE_LEN) { struct btrfs_io_context *bioc = NULL; struct btrfs_device *extent_dev; u64 extent_start; @@ -3217,7 +3217,7 @@ static int scrub_raid56_data_stripe_for_parity(struct scrub_ctx *sctx, u64 extent_mirror_num; ret = find_first_extent_item(extent_root, path, cur_logical, - logical + map->stripe_len - cur_logical); + logical + BTRFS_STRIPE_LEN - cur_logical); /* No more extent item in this data stripe */ if (ret > 0) { ret = 0; @@ -3231,7 +3231,7 @@ static int scrub_raid56_data_stripe_for_parity(struct scrub_ctx *sctx, /* Metadata should not cross stripe boundaries */ if ((extent_flags & BTRFS_EXTENT_FLAG_TREE_BLOCK) && does_range_cross_boundary(extent_start, extent_size, - logical, map->stripe_len)) { + logical, BTRFS_STRIPE_LEN)) { btrfs_err(fs_info, "scrub: tree block %llu spanning stripes, ignored. logical=%llu", extent_start, logical); @@ -3247,7 +3247,7 @@ static int scrub_raid56_data_stripe_for_parity(struct scrub_ctx *sctx, /* Truncate the range inside this data stripe */ extent_size = min(extent_start + extent_size, - logical + map->stripe_len) - cur_logical; + logical + BTRFS_STRIPE_LEN) - cur_logical; extent_start = cur_logical; ASSERT(extent_size <= U32_MAX); @@ -3320,8 +3320,7 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, path->search_commit_root = 1; path->skip_locking = 1; - ASSERT(map->stripe_len <= U32_MAX); - nsectors = map->stripe_len >> fs_info->sectorsize_bits; + nsectors = BTRFS_STRIPE_LEN >> fs_info->sectorsize_bits; ASSERT(nsectors <= BITS_PER_LONG); sparity = kzalloc(sizeof(struct scrub_parity), GFP_NOFS); if (!sparity) { @@ -3332,8 +3331,7 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, return -ENOMEM; } - ASSERT(map->stripe_len <= U32_MAX); - sparity->stripe_len = map->stripe_len; + sparity->stripe_len = BTRFS_STRIPE_LEN; sparity->nsectors = nsectors; sparity->sctx = sctx; sparity->scrub_dev = sdev; @@ -3344,7 +3342,7 @@ static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx, ret = 0; for (cur_logical = logic_start; cur_logical < logic_end; - cur_logical += map->stripe_len) { + cur_logical += BTRFS_STRIPE_LEN) { ret = scrub_raid56_data_stripe_for_parity(sctx, sparity, map, sdev, path, cur_logical); if (ret < 0) @@ -3536,7 +3534,7 @@ static u64 simple_stripe_full_stripe_len(const struct map_lookup *map) ASSERT(map->type & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID10)); - return map->num_stripes / map->sub_stripes * map->stripe_len; + return (map->num_stripes / map->sub_stripes) << BTRFS_STRIPE_LEN_SHIFT; } /* Get the logical bytenr for the stripe */ @@ -3552,7 +3550,8 @@ static u64 simple_stripe_get_logical(struct map_lookup *map, * (stripe_index / sub_stripes) gives how many data stripes we need to * skip. */ - return (stripe_index / map->sub_stripes) * map->stripe_len + bg->start; + return ((stripe_index / map->sub_stripes) << BTRFS_STRIPE_LEN_SHIFT) + + bg->start; } /* Get the mirror number for the stripe */ @@ -3589,14 +3588,14 @@ static int scrub_simple_stripe(struct scrub_ctx *sctx, * this stripe. */ ret = scrub_simple_mirror(sctx, extent_root, csum_root, bg, map, - cur_logical, map->stripe_len, device, + cur_logical, BTRFS_STRIPE_LEN, device, cur_physical, mirror_num); if (ret) return ret; /* Skip to next stripe which belongs to the target device */ cur_logical += logical_increment; /* For physical offset, we just go to next stripe */ - cur_physical += map->stripe_len; + cur_physical += BTRFS_STRIPE_LEN; } return ret; } @@ -3690,7 +3689,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, if (profile & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID10)) { ret = scrub_simple_stripe(sctx, root, csum_root, bg, map, scrub_dev, stripe_index); - offset = map->stripe_len * (stripe_index / map->sub_stripes); + offset = (stripe_index / map->sub_stripes) << BTRFS_STRIPE_LEN_SHIFT; goto out; } @@ -3705,7 +3704,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, /* Initialize @offset in case we need to go to out: label */ get_raid56_logic_offset(physical, stripe_index, map, &offset, NULL); - increment = map->stripe_len * nr_data_stripes(map); + increment = nr_data_stripes(map) << BTRFS_STRIPE_LEN_SHIFT; /* * Due to the rotation, for RAID56 it's better to iterate each stripe @@ -3736,13 +3735,13 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, * is still based on @mirror_num. */ ret = scrub_simple_mirror(sctx, root, csum_root, bg, map, - logical, map->stripe_len, + logical, BTRFS_STRIPE_LEN, scrub_dev, physical, 1); if (ret < 0) goto out; next: logical += increment; - physical += map->stripe_len; + physical += BTRFS_STRIPE_LEN; spin_lock(&sctx->stat_lock); if (stop_loop) sctx->stat.last_physical = diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c index f2f2e11dac4c..ed0f36ae5346 100644 --- a/fs/btrfs/tests/extent-map-tests.c +++ b/fs/btrfs/tests/extent-map-tests.c @@ -486,7 +486,6 @@ static int test_rmap_block(struct btrfs_fs_info *fs_info, em->map_lookup = map; map->num_stripes = test->num_stripes; - map->stripe_len = BTRFS_STRIPE_LEN; map->type = test->raid_type; for (i = 0; i < map->num_stripes; i++) { diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 7823168c08a6..00ce6772c1cd 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5109,7 +5109,7 @@ static void init_alloc_chunk_ctl_policy_regular( /* We don't want a chunk larger than 10% of writable space */ ctl->max_chunk_size = min(mult_perc(fs_devices->total_rw_bytes, 10), ctl->max_chunk_size); - ctl->dev_extent_min = BTRFS_STRIPE_LEN * ctl->dev_stripes; + ctl->dev_extent_min = ctl->dev_stripes << BTRFS_STRIPE_LEN_SHIFT; } static void init_alloc_chunk_ctl_policy_zoned( @@ -5391,7 +5391,6 @@ static struct btrfs_block_group *create_chunk(struct btrfs_trans_handle *trans, j * ctl->stripe_size; } } - map->stripe_len = BTRFS_STRIPE_LEN; map->io_align = BTRFS_STRIPE_LEN; map->io_width = BTRFS_STRIPE_LEN; map->type = type; @@ -5599,11 +5598,11 @@ int btrfs_chunk_alloc_add_chunk_item(struct btrfs_trans_handle *trans, btrfs_set_stack_chunk_length(chunk, bg->length); btrfs_set_stack_chunk_owner(chunk, BTRFS_EXTENT_TREE_OBJECTID); - btrfs_set_stack_chunk_stripe_len(chunk, map->stripe_len); + btrfs_set_stack_chunk_stripe_len(chunk, BTRFS_STRIPE_LEN); btrfs_set_stack_chunk_type(chunk, map->type); btrfs_set_stack_chunk_num_stripes(chunk, map->num_stripes); - btrfs_set_stack_chunk_io_align(chunk, map->stripe_len); - btrfs_set_stack_chunk_io_width(chunk, map->stripe_len); + btrfs_set_stack_chunk_io_align(chunk, BTRFS_STRIPE_LEN); + btrfs_set_stack_chunk_io_width(chunk, BTRFS_STRIPE_LEN); btrfs_set_stack_chunk_sector_size(chunk, fs_info->sectorsize); btrfs_set_stack_chunk_sub_stripes(chunk, map->sub_stripes); @@ -5793,7 +5792,7 @@ unsigned long btrfs_full_stripe_len(struct btrfs_fs_info *fs_info, if (!WARN_ON(IS_ERR(em))) { map = em->map_lookup; if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) - len = map->stripe_len * nr_data_stripes(map); + len = nr_data_stripes(map) << BTRFS_STRIPE_LEN_SHIFT; free_extent_map(em); } return len; @@ -5959,7 +5958,6 @@ struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info, u64 stripe_nr_end; u64 stripe_end_offset; u64 stripe_cnt; - u64 stripe_len; u64 stripe_offset; u32 stripe_index; u32 factor = 0; @@ -5980,26 +5978,25 @@ struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info, if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) { ret = -EOPNOTSUPP; goto out_free_map; -} + } offset = logical - em->start; length = min_t(u64, em->start + em->len - logical, length); *length_ret = length; - stripe_len = map->stripe_len; /* * stripe_nr counts the total number of stripes we have to stride * to get to this block */ - stripe_nr = div64_u64(offset, stripe_len); + stripe_nr = offset >> BTRFS_STRIPE_LEN_SHIFT; /* stripe_offset is the offset of this block in its stripe */ - stripe_offset = offset - stripe_nr * stripe_len; + stripe_offset = offset - (stripe_nr << BTRFS_STRIPE_LEN_SHIFT); - stripe_nr_end = round_up(offset + length, map->stripe_len); - stripe_nr_end = div64_u64(stripe_nr_end, map->stripe_len); + stripe_nr_end = round_up(offset + length, BTRFS_STRIPE_LEN) >> + BTRFS_STRIPE_LEN_SHIFT; stripe_cnt = stripe_nr_end - stripe_nr; - stripe_end_offset = stripe_nr_end * map->stripe_len - + stripe_end_offset = (stripe_nr_end << BTRFS_STRIPE_LEN_SHIFT) - (offset + length); /* * after this, stripe_nr is the number of stripes on this @@ -6041,15 +6038,15 @@ struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info, for (i = 0; i < *num_stripes; i++) { stripes[i].physical = map->stripes[stripe_index].physical + - stripe_offset + stripe_nr * map->stripe_len; + stripe_offset + (stripe_nr << BTRFS_STRIPE_LEN_SHIFT); stripes[i].dev = map->stripes[stripe_index].dev; if (map->type & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID10)) { - stripes[i].length = stripes_per_dev * map->stripe_len; + stripes[i].length = stripes_per_dev << BTRFS_STRIPE_LEN_SHIFT; if (i / sub_stripes < remaining_stripes) - stripes[i].length += map->stripe_len; + stripes[i].length += BTRFS_STRIPE_LEN; /* * Special for the first stripe and @@ -6288,22 +6285,32 @@ static u64 btrfs_max_io_len(struct map_lookup *map, enum btrfs_map_op op, u64 offset, u64 *stripe_nr, u64 *stripe_offset, u64 *full_stripe_start) { - u32 stripe_len = map->stripe_len; - ASSERT(op != BTRFS_MAP_DISCARD); /* * Stripe_nr is the stripe where this block falls. stripe_offset is * the offset of this block in its stripe. */ - *stripe_nr = div64_u64_rem(offset, stripe_len, stripe_offset); + *stripe_offset = offset & BTRFS_STRIPE_LEN_MASK; + *stripe_nr = offset >> BTRFS_STRIPE_LEN_SHIFT; ASSERT(*stripe_offset < U32_MAX); if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) { - unsigned long full_stripe_len = stripe_len * nr_data_stripes(map); + unsigned long full_stripe_len = nr_data_stripes(map) << + BTRFS_STRIPE_LEN_SHIFT; + /* + * For full stripe start, we use previously calculated + * @stripe_nr, just align it to nr_data_stripes, then + * multiply with STRIPE_LEN, we can call it a day. + * + * By this, we can avoid u64 division completely. + * And we have to go rounddown(), not round_down(), as + * nr_data_stripes is not ensured to be power of 2. + */ *full_stripe_start = - div64_u64(offset, full_stripe_len) * full_stripe_len; + rounddown(*stripe_nr, nr_data_stripes(map)) << + BTRFS_STRIPE_LEN_SHIFT; /* * For writes to RAID56, allow to write a full stripe set, but @@ -6318,7 +6325,7 @@ static u64 btrfs_max_io_len(struct map_lookup *map, enum btrfs_map_op op, * a single disk). */ if (map->type & BTRFS_BLOCK_GROUP_STRIPE_MASK) - return stripe_len - *stripe_offset; + return BTRFS_STRIPE_LEN - *stripe_offset; return U64_MAX; } @@ -6327,7 +6334,7 @@ static void set_io_stripe(struct btrfs_io_stripe *dst, const struct map_lookup * { dst->dev = map->stripes[stripe_index].dev; dst->physical = map->stripes[stripe_index].physical + - stripe_offset + stripe_nr * map->stripe_len; + stripe_offset + (stripe_nr << BTRFS_STRIPE_LEN_SHIFT); } int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, @@ -6341,7 +6348,6 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, u64 map_offset; u64 stripe_offset; u64 stripe_nr; - u64 stripe_len; u32 stripe_index; int data_stripes; int i; @@ -6367,7 +6373,6 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, map = em->map_lookup; data_stripes = nr_data_stripes(map); - stripe_len = map->stripe_len; map_offset = logical - em->start; max_len = btrfs_max_io_len(map, op, map_offset, &stripe_nr, @@ -6443,11 +6448,10 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, } } else if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) { - ASSERT(map->stripe_len == BTRFS_STRIPE_LEN); if (need_raid_map && (need_full_stripe(op) || mirror_num > 1)) { /* push stripe_nr back to the start of the full stripe */ stripe_nr = div64_u64(raid56_full_stripe_start, - stripe_len * data_stripes); + data_stripes << BTRFS_STRIPE_LEN_SHIFT); /* RAID[56] write or recovery. Return all stripes */ num_stripes = map->num_stripes; @@ -6456,7 +6460,7 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, /* Return the length to the full stripe end */ *length = min(logical + *length, raid56_full_stripe_start + em->start + - data_stripes * stripe_len) - logical; + (data_stripes << BTRFS_STRIPE_LEN_SHIFT)) - logical; stripe_index = 0; stripe_offset = 0; } else { @@ -6551,7 +6555,7 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, tmp = stripe_nr * data_stripes; for (i = 0; i < data_stripes; i++) bioc->raid_map[(i + rot) % num_stripes] = - em->start + (tmp + i) * map->stripe_len; + em->start + ((tmp + i) << BTRFS_STRIPE_LEN_SHIFT); bioc->raid_map[(i + rot) % map->num_stripes] = RAID5_P_STRIPE; if (map->type & BTRFS_BLOCK_GROUP_RAID6) @@ -6924,7 +6928,6 @@ static int read_one_chunk(struct btrfs_key *key, struct extent_buffer *leaf, map->num_stripes = num_stripes; map->io_width = btrfs_chunk_io_width(leaf, chunk); map->io_align = btrfs_chunk_io_align(leaf, chunk); - map->stripe_len = btrfs_chunk_stripe_len(leaf, chunk); map->type = type; /* * We can't use the sub_stripes value, as for profiles other than diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 7e51f2238f72..dad4a5d39822 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -19,6 +19,9 @@ extern struct mutex uuid_mutex; #define BTRFS_STRIPE_LEN SZ_64K +#define BTRFS_STRIPE_LEN_SHIFT (const_ilog2(BTRFS_STRIPE_LEN)) +#define BTRFS_STRIPE_LEN_MASK (BTRFS_STRIPE_LEN - 1) + /* Used by sanity check for btrfs_raid_types. */ #define const_ffs(n) (__builtin_ctzll(n) + 1) @@ -446,7 +449,6 @@ struct map_lookup { u64 type; int io_align; int io_width; - u32 stripe_len; int num_stripes; int sub_stripes; int verified_stripes; /* For mount time dev extent verification */ From patchwork Fri Feb 17 05:36:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13144331 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64E88C636D6 for ; Fri, 17 Feb 2023 05:37:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229557AbjBQFhk (ORCPT ); Fri, 17 Feb 2023 00:37:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43928 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229551AbjBQFhi (ORCPT ); Fri, 17 Feb 2023 00:37:38 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D80B85BD9E for ; Thu, 16 Feb 2023 21:37:25 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 813A220078 for ; Fri, 17 Feb 2023 05:37:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1676612244; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rCKIDEzUoooOFNYi2SK0z1lONa33/nLUqP8V+oSTqnE=; b=KiN43I0viIRmS1UfOcpP7o36oyIY+ijWBCL7x2LADtWB1hxgJz/0alWiSg9JcfxRIHbtGZ SRbtO3PgscEi6IBX3vaeo013HZwU1RQW+HrEMhtu0xCOR7xn4s7EUrkRvfBmCDYmov8mm5 bCLfsQor85cGqSJSdEJp40MlkusizmQ= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D68151323E for ; Fri, 17 Feb 2023 05:37:23 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id QFsAKJMS72PTSQAAMHmgww (envelope-from ) for ; Fri, 17 Feb 2023 05:37:23 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 2/6] btrfs: reduce div64 calls by limiting the number of stripes of a chunk to u32 Date: Fri, 17 Feb 2023 13:36:59 +0800 Message-Id: X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org There are quite some div64 calls inside btrfs_map_block() and its variants. Such calls are for @stripe_nr, where @stripe_nr is the number of stripes before our logical bytenr inside a chunk. However we can eliminate such div64 calls by just reducing the width of @stripe_nr from 64 to 32. This can be done because our chunk size limit is already 10G, with fixed stripe length 64K. Thus a U32 is definitely enough to contain the number of stripes. With such width reduction, we can get rid of slower div64, and extra warning for certain 32bit arch. This patch would do: - Add a new tree-checker chunk validation on chunk length Make sure no chunk can reach 256G, which can also act as a bitflip checker. - Reduce the width from u64 to u32 for @stripe_nr variables - Replace unnecessary div64 calls with regular modulo and division 32bit division and modulo are much faster than 64bit operations, and we are finally free of the div64 fear at least in those involved functions. Signed-off-by: Qu Wenruo --- fs/btrfs/block-group.c | 12 ++++---- fs/btrfs/scrub.c | 14 +++++---- fs/btrfs/tree-checker.c | 14 +++++++++ fs/btrfs/volumes.c | 65 +++++++++++++++++++++++------------------ 4 files changed, 63 insertions(+), 42 deletions(-) diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index ac22c55be440..f37477107a94 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -2007,8 +2007,8 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, for (i = 0; i < map->num_stripes; i++) { bool already_inserted = false; - u64 stripe_nr; - u64 offset; + u32 stripe_nr; + u32 offset; int j; if (!in_range(physical, map->stripes[i].physical, @@ -2021,16 +2021,14 @@ int btrfs_rmap_block(struct btrfs_fs_info *fs_info, u64 chunk_start, BTRFS_STRIPE_LEN_MASK; if (map->type & (BTRFS_BLOCK_GROUP_RAID0 | - BTRFS_BLOCK_GROUP_RAID10)) { - stripe_nr = stripe_nr * map->num_stripes + i; - stripe_nr = div_u64(stripe_nr, map->sub_stripes); - } + BTRFS_BLOCK_GROUP_RAID10)) + stripe_nr = div_u64(stripe_nr * map->num_stripes + i, + map->sub_stripes); /* * The remaining case would be for RAID56, multiply by * nr_data_stripes(). Alternatively, just use rmap_len below * instead of map->stripe_len */ - bytenr = chunk_start + stripe_nr * io_stripe_size + offset; /* Ensure we don't add duplicate addresses */ diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 190536b90779..3b4e621e822f 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2908,10 +2908,7 @@ static int get_raid56_logic_offset(u64 physical, int num, { int i; int j = 0; - u64 stripe_nr; u64 last_offset; - u32 stripe_index; - u32 rot; const int data_stripes = nr_data_stripes(map); last_offset = (physical - map->stripes[num].physical) * data_stripes; @@ -2920,13 +2917,18 @@ static int get_raid56_logic_offset(u64 physical, int num, *offset = last_offset; for (i = 0; i < data_stripes; i++) { + u32 stripe_nr; + u32 stripe_index; + u32 rot; + *offset = last_offset + (i << BTRFS_STRIPE_LEN_SHIFT); - stripe_nr = *offset >> BTRFS_STRIPE_LEN_SHIFT; - stripe_nr = div_u64(stripe_nr, data_stripes); + stripe_nr = (u32)(*offset >> BTRFS_STRIPE_LEN_SHIFT) / + data_stripes; /* Work out the disk rotation on this stripe-set */ - stripe_nr = div_u64_rem(stripe_nr, map->num_stripes, &rot); + rot = stripe_nr % map->num_stripes; + stripe_nr /= map->num_stripes; /* calculate which stripe this data locates */ rot += i; stripe_index = rot % map->num_stripes; diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index baad1ed7e111..af04f6661035 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -849,6 +849,20 @@ int btrfs_check_chunk_valid(struct extent_buffer *leaf, stripe_len); return -EUCLEAN; } + /* + * We artificially limit the chunk size, so that the number of stripes + * inside a chunk can be fit into a U32. + * The current limit (256G) is way too large for real world usage + * anyway, and it's also much larger than our existing limit (10G). + * + * Thus it should be a good way to catch obvious bitflip. + */ + if (unlikely(length >= ((u64)U32_MAX << BTRFS_STRIPE_LEN_SHIFT))) { + chunk_err(leaf, chunk, logical, + "chunk length too large: have %llu limit %llu", + length, (u64)U32_MAX << BTRFS_STRIPE_LEN_SHIFT); + return -EUCLEAN; + } if (unlikely(type & ~(BTRFS_BLOCK_GROUP_TYPE_MASK | BTRFS_BLOCK_GROUP_PROFILE_MASK))) { chunk_err(leaf, chunk, logical, diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 00ce6772c1cd..45ac6e14d268 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5954,15 +5954,15 @@ struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info, struct btrfs_discard_stripe *stripes; u64 length = *length_ret; u64 offset; - u64 stripe_nr; - u64 stripe_nr_end; + u32 stripe_nr; + u32 stripe_nr_end; + u32 stripe_cnt; u64 stripe_end_offset; - u64 stripe_cnt; u64 stripe_offset; u32 stripe_index; u32 factor = 0; u32 sub_stripes = 0; - u64 stripes_per_dev = 0; + u32 stripes_per_dev = 0; u32 remaining_stripes = 0; u32 last_stripe = 0; int ret; @@ -6015,18 +6015,19 @@ struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info, factor = map->num_stripes / sub_stripes; *num_stripes = min_t(u64, map->num_stripes, sub_stripes * stripe_cnt); - stripe_nr = div_u64_rem(stripe_nr, factor, &stripe_index); + stripe_index = stripe_nr % factor; + stripe_nr /= factor; stripe_index *= sub_stripes; - stripes_per_dev = div_u64_rem(stripe_cnt, factor, - &remaining_stripes); - div_u64_rem(stripe_nr_end - 1, factor, &last_stripe); - last_stripe *= sub_stripes; + + remaining_stripes = stripe_cnt % factor; + stripes_per_dev = stripe_cnt / factor; + last_stripe = (stripe_nr_end - 1) % factor * sub_stripes; } else if (map->type & (BTRFS_BLOCK_GROUP_RAID1_MASK | BTRFS_BLOCK_GROUP_DUP)) { *num_stripes = map->num_stripes; } else { - stripe_nr = div_u64_rem(stripe_nr, map->num_stripes, - &stripe_index); + stripe_index = stripe_nr % map->num_stripes; + stripe_nr /= map->num_stripes; } stripes = kcalloc(*num_stripes, sizeof(*stripes), GFP_NOFS); @@ -6282,7 +6283,7 @@ static bool need_full_stripe(enum btrfs_map_op op) } static u64 btrfs_max_io_len(struct map_lookup *map, enum btrfs_map_op op, - u64 offset, u64 *stripe_nr, u64 *stripe_offset, + u64 offset, u32 *stripe_nr, u64 *stripe_offset, u64 *full_stripe_start) { ASSERT(op != BTRFS_MAP_DISCARD); @@ -6330,7 +6331,7 @@ static u64 btrfs_max_io_len(struct map_lookup *map, enum btrfs_map_op op, } static void set_io_stripe(struct btrfs_io_stripe *dst, const struct map_lookup *map, - u32 stripe_index, u64 stripe_offset, u64 stripe_nr) + u32 stripe_index, u64 stripe_offset, u32 stripe_nr) { dst->dev = map->stripes[stripe_index].dev; dst->physical = map->stripes[stripe_index].physical + @@ -6347,7 +6348,7 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, struct map_lookup *map; u64 map_offset; u64 stripe_offset; - u64 stripe_nr; + u32 stripe_nr; u32 stripe_index; int data_stripes; int i; @@ -6405,8 +6406,8 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, num_stripes = 1; stripe_index = 0; if (map->type & BTRFS_BLOCK_GROUP_RAID0) { - stripe_nr = div_u64_rem(stripe_nr, map->num_stripes, - &stripe_index); + stripe_index = stripe_nr % map->num_stripes; + stripe_nr /= map->num_stripes; if (!need_full_stripe(op)) mirror_num = 1; } else if (map->type & BTRFS_BLOCK_GROUP_RAID1_MASK) { @@ -6432,8 +6433,8 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, } else if (map->type & BTRFS_BLOCK_GROUP_RAID10) { u32 factor = map->num_stripes / map->sub_stripes; - stripe_nr = div_u64_rem(stripe_nr, factor, &stripe_index); - stripe_index *= map->sub_stripes; + stripe_index = stripe_nr % factor * map->sub_stripes; + stripe_nr /= factor; if (need_full_stripe(op)) num_stripes = map->sub_stripes; @@ -6449,9 +6450,16 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, } else if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) { if (need_raid_map && (need_full_stripe(op) || mirror_num > 1)) { - /* push stripe_nr back to the start of the full stripe */ - stripe_nr = div64_u64(raid56_full_stripe_start, - data_stripes << BTRFS_STRIPE_LEN_SHIFT); + /* + * Push stripe_nr back to the start of the full stripe + * For those cases needing a full stripe, @stripe_nr + * is the full stripe number. + * + * Original we go raid56_full_stripe_start / full_stripe_len, + * but that can be expensive. + * Here we just divide @stripe_nr with @data_stripes. + */ + stripe_nr /= data_stripes; /* RAID[56] write or recovery. Return all stripes */ num_stripes = map->num_stripes; @@ -6469,25 +6477,24 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, * Mirror #2 is RAID5 parity block. * Mirror #3 is RAID6 Q block. */ - stripe_nr = div_u64_rem(stripe_nr, - data_stripes, &stripe_index); + stripe_index = stripe_nr % data_stripes; + stripe_nr /= data_stripes; if (mirror_num > 1) stripe_index = data_stripes + mirror_num - 2; /* We distribute the parity blocks across stripes */ - div_u64_rem(stripe_nr + stripe_index, map->num_stripes, - &stripe_index); + stripe_index = (stripe_nr + stripe_index) % map->num_stripes; if (!need_full_stripe(op) && mirror_num <= 1) mirror_num = 1; } } else { /* - * after this, stripe_nr is the number of stripes on this + * After this, stripe_nr is the number of stripes on this * device we have to walk to find the data, and stripe_index is * the number of our device in the stripe array */ - stripe_nr = div_u64_rem(stripe_nr, map->num_stripes, - &stripe_index); + stripe_index = stripe_nr % map->num_stripes; + stripe_nr /= map->num_stripes; mirror_num = stripe_index + 1; } if (stripe_index >= map->num_stripes) { @@ -6549,7 +6556,7 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, unsigned rot; /* Work out the disk rotation on this stripe-set */ - div_u64_rem(stripe_nr, num_stripes, &rot); + rot = stripe_nr % num_stripes; /* Fill in the logical address of each stripe */ tmp = stripe_nr * data_stripes; From patchwork Fri Feb 17 05:37:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13144332 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B99CC64EC4 for ; Fri, 17 Feb 2023 05:37:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229602AbjBQFhl (ORCPT ); Fri, 17 Feb 2023 00:37:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229628AbjBQFhk (ORCPT ); Fri, 17 Feb 2023 00:37:40 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6F1F05BDA8 for ; Thu, 16 Feb 2023 21:37:27 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 31FEC208B4; Fri, 17 Feb 2023 05:37:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1676612246; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=r22TTCPbkXuhxgb1VGt4MQpVWEmpWeNGbSEfmDCUNAM=; b=ieFvwjN2QQT5dxlhGZNRGFSxIXIuNVPQSY9TcLri/dG4Og+FPFE2KyVP/tl2zS72MXqwIo WkP/szZ3soZiVVnAkbhmrSaUUOdJFmBtv4Lx/j0ZaHXg6rLy63h72v42i4aq8z7cj47CzD Q/ygRyMnGN8O8y3V3XjLzIeg0ZzOgSA= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id E25F51323E; Fri, 17 Feb 2023 05:37:24 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id wHi1KpQS72PTSQAAMHmgww (envelope-from ); Fri, 17 Feb 2023 05:37:24 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Johannes Thumshirn , Anand Jain , David Sterba Subject: [PATCH 3/6] btrfs: simplify the bioc argument for handle_ops_on_dev_replace() Date: Fri, 17 Feb 2023 13:37:00 +0800 Message-Id: <84d631e452d118087b4a3cdf9994695412c95462.1676611535.git.wqu@suse.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org There is no memory re-allocation for handle_ops_on_dev_replace(), thus we don't need to pass a btrfs_io_context pointer. Reviewed-by: Johannes Thumshirn Reviewed-by: Anand Jain Signed-off-by: Qu Wenruo Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/volumes.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 45ac6e14d268..4408b1b0d4d1 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6180,12 +6180,11 @@ static bool is_block_group_to_copy(struct btrfs_fs_info *fs_info, u64 logical) } static void handle_ops_on_dev_replace(enum btrfs_map_op op, - struct btrfs_io_context **bioc_ret, + struct btrfs_io_context *bioc, struct btrfs_dev_replace *dev_replace, u64 logical, int *num_stripes_ret, int *max_errors_ret) { - struct btrfs_io_context *bioc = *bioc_ret; u64 srcdev_devid = dev_replace->srcdev->devid; int tgtdev_indexes = 0; int num_stripes = *num_stripes_ret; @@ -6274,7 +6273,6 @@ static void handle_ops_on_dev_replace(enum btrfs_map_op op, *num_stripes_ret = num_stripes; *max_errors_ret = max_errors; bioc->num_tgtdevs = tgtdev_indexes; - *bioc_ret = bioc; } static bool need_full_stripe(enum btrfs_map_op op) @@ -6577,7 +6575,7 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL && need_full_stripe(op)) { - handle_ops_on_dev_replace(op, &bioc, dev_replace, logical, + handle_ops_on_dev_replace(op, bioc, dev_replace, logical, &num_stripes, &max_errors); } From patchwork Fri Feb 17 05:37:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13144333 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2EE05C636D4 for ; Fri, 17 Feb 2023 05:37:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229539AbjBQFhn (ORCPT ); Fri, 17 Feb 2023 00:37:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44054 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229625AbjBQFhm (ORCPT ); Fri, 17 Feb 2023 00:37:42 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AE27C5BD90 for ; Thu, 16 Feb 2023 21:37:28 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 6EEF420096; Fri, 17 Feb 2023 05:37:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1676612247; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=agBbaIn7MmTwKvV1SVrDSgInzC5iHOOmcRL8bO0WY0E=; b=WYI93JGbirXD1Szq7ZUmdQKZ3wZGkgeA/t4YAcvlDfTQC7TmtDqRYwbjbVAPglVNg6+0Hy H0mvBDDPlXQoXAZUdR6tSz7nbVDh6TWkj4Rk+IkJXQVfK1QpYg0RCkfsCPA45BIDyIkO+n t5+o1WHnGavtEmrxZgi1izaCxNBAKEg= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 9212F1323E; Fri, 17 Feb 2023 05:37:26 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id KI9eF5YS72PTSQAAMHmgww (envelope-from ); Fri, 17 Feb 2023 05:37:26 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: David Sterba Subject: [PATCH 4/6] btrfs: reduce type width of btrfs_io_contexts Date: Fri, 17 Feb 2023 13:37:01 +0800 Message-Id: <2d8b241029979a83135e5e8c212b3547e822a37e.1676611535.git.wqu@suse.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org That structure is our ultimate object for all __btrfs_map_block() related functions. We have some hard to understand members, like tgtdev_map, but without any comments. This patch will improve the situation: - Add extra comments for num_stripes, mirror_num, num_tgtdevs and tgtdev_map[] Especially for the last two members, add a dedicated (thus very long) comments for them, with example to explain it. - Shrink those int members to u16. In fact our on-disk format is only using u16 for num_stripes, thus no need to use int at all. Signed-off-by: Qu Wenruo Signed-off-by: David Sterba --- fs/btrfs/volumes.c | 16 ++++++++------ fs/btrfs/volumes.h | 54 +++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 58 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 4408b1b0d4d1..f3b6d977ae5c 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5898,16 +5898,18 @@ static void sort_parity_stripes(struct btrfs_io_context *bioc, int num_stripes) } static struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_info, - int total_stripes, - int real_stripes) + u16 total_stripes, + u16 real_stripes) { - struct btrfs_io_context *bioc = kzalloc( + struct btrfs_io_context *bioc; + + bioc = kzalloc( /* The size of btrfs_io_context */ sizeof(struct btrfs_io_context) + /* Plus the variable array for the stripes */ sizeof(struct btrfs_io_stripe) * (total_stripes) + /* Plus the variable array for the tgt dev */ - sizeof(int) * (real_stripes) + + sizeof(u16) * (real_stripes) + /* * Plus the raid_map, which includes both the tgt dev * and the stripes. @@ -5921,7 +5923,7 @@ static struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_ refcount_set(&bioc->refs, 1); bioc->fs_info = fs_info; - bioc->tgtdev_map = (int *)(bioc->stripes + total_stripes); + bioc->tgtdev_map = (u16 *)(bioc->stripes + total_stripes); bioc->raid_map = (u64 *)(bioc->tgtdev_map + real_stripes); return bioc; @@ -6354,12 +6356,12 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, int mirror_num = (mirror_num_ret ? *mirror_num_ret : 0); int num_stripes; int max_errors = 0; - int tgtdev_indexes = 0; struct btrfs_io_context *bioc = NULL; struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace; int dev_replace_is_ongoing = 0; - int num_alloc_stripes; int patch_the_first_stripe_for_dev_replace = 0; + u16 num_alloc_stripes; + u16 tgtdev_indexes = 0; u64 physical_to_patch_in_first_stripe = 0; u64 raid56_full_stripe_start = (u64)-1; u64 max_len; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index dad4a5d39822..34fe3a895301 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -407,11 +407,55 @@ struct btrfs_io_context { u64 map_type; /* get from map_lookup->type */ struct bio *orig_bio; atomic_t error; - int max_errors; - int num_stripes; - int mirror_num; - int num_tgtdevs; - int *tgtdev_map; + u16 max_errors; + + /* + * The total number of stripes, including the extra duplicated + * stripe for replace. + */ + u16 num_stripes; + + /* + * The mirror_num of this bioc. + * + * This is for reads which use 0 as mirror_num, thus we should return a + * valid mirror_num (>0) for the reader. + */ + u16 mirror_num; + + /* + * The following two members are for dev-replace case only. + * + * @num_tgtdevs: Number of duplicated stripes which need to be + * written to replace target. + * Should be <= 2 (2 for DUP, otherwise <= 1). + * @tgtdev_map: The array indicates where the duplicated stripes + * are from. The size is the number of original + * stripes (num_stripes - num_tgtdevs). + * + * The @tgtdev_map[] array is mostly for RAID56 cases. + * As non-RAID56 stripes share the same contents of the mapped range, + * thus no need to bother where the duplicated ones are from. + * + * But for RAID56 case, all stripes contain different contents, thus + * we need a way to know the mapping. + * + * There is an example for the two members, using a RAID5 write: + * + * num_stripes: 4 (3 + 1 duplicated write) + * stripes[0]: dev = devid 1, physical = X + * stripes[1]: dev = devid 2, physical = Y + * stripes[2]: dev = devid 3, physical = Z + * stripes[3]: dev = devid 0, physical = Y + * + * num_tgtdevs = 1 + * tgtdev_map[0] = 0 <- Means stripes[0] is not involved in replace. + * tgtdev_map[1] = 3 <- Means stripes[1] is involved in replace, + * and it's duplicated to stripes[3]. + * tgtdev_map[2] = 0 <- Means stripes[2] is not involved in replace. + */ + u16 num_tgtdevs; + u16 *tgtdev_map; /* * logical block numbers for the start of each stripe * The last one or two are p/q. These are sorted, From patchwork Fri Feb 17 05:37:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13144334 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73D8BC05027 for ; Fri, 17 Feb 2023 05:37:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229508AbjBQFhp (ORCPT ); Fri, 17 Feb 2023 00:37:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229461AbjBQFho (ORCPT ); Fri, 17 Feb 2023 00:37:44 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2AC165BDA0 for ; Thu, 16 Feb 2023 21:37:30 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id ADBBC20079; Fri, 17 Feb 2023 05:37:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1676612248; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mX9qvX0epn44w2NTZaGq+AKiA2hNSppIveXyb+xKPG0=; b=XV1vc5U1cYxahejDBb9Tu7WRK8tA2WXrVO9YkXLjgF2XqpeLx/Cl+3U6YdktkVsYYkI+XZ aFQa/TcvNuycC7O1GMU2KQWd0gpq8Ckh2oarXHSnoP24nfTNaLv9LAX1EdbpwCSq0iCKig HOAN2fpybd62EIeiXGChbLyxdSSKYNM= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id CFE1A1323E; Fri, 17 Feb 2023 05:37:27 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id AOVYJpcS72PTSQAAMHmgww (envelope-from ); Fri, 17 Feb 2023 05:37:27 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: David Sterba Subject: [PATCH 5/6] btrfs: use a more space efficient way to represent the source of duplicated stripes Date: Fri, 17 Feb 2023 13:37:02 +0800 Message-Id: <6ec2752ca0e56ba30e49813b222898bb96e24247.1676611535.git.wqu@suse.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For btrfs dev-replace, we have to duplicate writes to the source device into the target device. For non-RAID56, all writes into the same mapped ranges are sharing the same content, thus they don't really need to bother anything. (E.g. in btrfs_submit_bio() for non-RAID56 range we just submit the same write to all involved devices). But for RAID56, all stripes contain different content, thus we must have a clear mapping of which stripe is duplicated from which original stripe. Currently we use a complex way using tgtdev_map[] array, e.g: num_tgtdevs = 1 tgtdev_map[0] = 0 <- Means stripes[0] is not involved in replace. tgtdev_map[1] = 3 <- Means stripes[1] is involved in replace, and it's duplicated to stripes[3]. tgtdev_map[2] = 0 <- Means stripes[2] is not involved in replace. But this is wasting some space, and ignored one important thing for dev-replace, there is at most ONE running replace. Thus we can change it to a fixed array to represent the mapping: replace_nr_stripes = 1 replace_stripe_src = 1 <- Means stripes[1] is involved in replace. thus the extra stripe is a copy of stripes[1] By this we can save some space for bioc on RAID56 chunks with many devices. And we get rid of one variable sized array from bioc. Thus the patch involves the following changes: - Replace @num_tgtdevs and @tgtdev_map[] with @replace_nr_stripes and @replace_stripe_src. @num_tgtdevs is just renamed to @replace_nr_stripes. While the mapping is completely changed. - Add extra ASSERT()s for RAID56 code - Only add two more extra stripes for dev-replace cases. As we have a upper limit on how many dev-replace stripes we can have. - Unify the behavior of handle_ops_on_dev_replace() Previously handle_ops_on_dev_replace() go two different paths for WRITE and GET_READ_MIRRORS. Now unify them by always go WRITE path first (with at most 2 replace stripes), then if we're doing GET_READ_MIRRORS and we have 2 extra stripes, just drop one stripe. - Remove the @real_stripes argument from alloc_btrfs_io_context() As we don't need the old variable length array any more. Signed-off-by: Qu Wenruo Signed-off-by: David Sterba --- fs/btrfs/raid56.c | 37 +++++++--- fs/btrfs/scrub.c | 4 +- fs/btrfs/volumes.c | 167 ++++++++++++++++++++------------------------- fs/btrfs/volumes.h | 26 +++---- 4 files changed, 119 insertions(+), 115 deletions(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 642828c1b299..e495ac147822 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -912,7 +912,7 @@ static struct sector_ptr *sector_in_rbio(struct btrfs_raid_bio *rbio, static struct btrfs_raid_bio *alloc_rbio(struct btrfs_fs_info *fs_info, struct btrfs_io_context *bioc) { - const unsigned int real_stripes = bioc->num_stripes - bioc->num_tgtdevs; + const unsigned int real_stripes = bioc->num_stripes - bioc->replace_nr_stripes; const unsigned int stripe_npages = BTRFS_STRIPE_LEN >> PAGE_SHIFT; const unsigned int num_pages = stripe_npages * real_stripes; const unsigned int stripe_nsectors = @@ -1282,10 +1282,16 @@ static int rmw_assemble_write_bios(struct btrfs_raid_bio *rbio, goto error; } - if (likely(!rbio->bioc->num_tgtdevs)) + if (likely(!rbio->bioc->replace_nr_stripes)) return 0; - /* Make a copy for the replace target device. */ + /* + * Make a copy for the replace target device. + * + * Thus the source stripe number (in replace_stripe_src) should be valid. + */ + ASSERT(rbio->bioc->replace_stripe_src >= 0); + for (total_sector_nr = 0; total_sector_nr < rbio->nr_sectors; total_sector_nr++) { struct sector_ptr *sector; @@ -1293,7 +1299,12 @@ static int rmw_assemble_write_bios(struct btrfs_raid_bio *rbio, stripe = total_sector_nr / rbio->stripe_nsectors; sectornr = total_sector_nr % rbio->stripe_nsectors; - if (!rbio->bioc->tgtdev_map[stripe]) { + /* + * For RAID56, there is only one device that can be replaced, + * and replace_stripe_src[0] indicates the stripe number we + * need to copy from. + */ + if (stripe != rbio->bioc->replace_stripe_src) { /* * We can skip the whole stripe completely, note * total_sector_nr will be increased by one anyway. @@ -1316,7 +1327,7 @@ static int rmw_assemble_write_bios(struct btrfs_raid_bio *rbio, } ret = rbio_add_io_sector(rbio, bio_list, sector, - rbio->bioc->tgtdev_map[stripe], + rbio->real_stripes, sectornr, REQ_OP_WRITE); if (ret) goto error; @@ -2442,7 +2453,12 @@ static int finish_parity_scrub(struct btrfs_raid_bio *rbio, int need_check) else BUG(); - if (bioc->num_tgtdevs && bioc->tgtdev_map[rbio->scrubp]) { + /* + * Replace is running and our P/Q stripe is being replaced, then we + * need to duplicate the final write to replace target. + */ + if (bioc->replace_nr_stripes && + bioc->replace_stripe_src == rbio->scrubp) { is_replace = 1; bitmap_copy(pbitmap, &rbio->dbitmap, rbio->stripe_nsectors); } @@ -2544,13 +2560,18 @@ static int finish_parity_scrub(struct btrfs_raid_bio *rbio, int need_check) if (!is_replace) goto submit_write; + /* + * Replace is running and our parity stripe needs to be duplicated to + * the target device. Check we have a valid source stripe number. + */ + ASSERT(rbio->bioc->replace_stripe_src >= 0); for_each_set_bit(sectornr, pbitmap, rbio->stripe_nsectors) { struct sector_ptr *sector; sector = rbio_stripe_sector(rbio, rbio->scrubp, sectornr); ret = rbio_add_io_sector(rbio, &bio_list, sector, - bioc->tgtdev_map[rbio->scrubp], - sectornr, REQ_OP_WRITE); + rbio->real_stripes, + sectornr, REQ_OP_WRITE); if (ret) goto cleanup; } diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 3b4e621e822f..83de9fecc7b6 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -1230,7 +1230,7 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) sblock_other = sblocks_for_recheck[mirror_index]; } else { struct scrub_recover *r = sblock_bad->sectors[0]->recover; - int max_allowed = r->bioc->num_stripes - r->bioc->num_tgtdevs; + int max_allowed = r->bioc->num_stripes - r->bioc->replace_nr_stripes; if (mirror_index >= max_allowed) break; @@ -1540,7 +1540,7 @@ static int scrub_setup_recheck_block(struct scrub_block *original_sblock, bioc->map_type, bioc->raid_map, bioc->num_stripes - - bioc->num_tgtdevs, + bioc->replace_nr_stripes, mirror_index, &stripe_index, &stripe_offset); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f3b6d977ae5c..31a25281c7d8 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5898,8 +5898,7 @@ static void sort_parity_stripes(struct btrfs_io_context *bioc, int num_stripes) } static struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_info, - u16 total_stripes, - u16 real_stripes) + u16 total_stripes) { struct btrfs_io_context *bioc; @@ -5908,8 +5907,6 @@ static struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_ sizeof(struct btrfs_io_context) + /* Plus the variable array for the stripes */ sizeof(struct btrfs_io_stripe) * (total_stripes) + - /* Plus the variable array for the tgt dev */ - sizeof(u16) * (real_stripes) + /* * Plus the raid_map, which includes both the tgt dev * and the stripes. @@ -5923,8 +5920,8 @@ static struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_ refcount_set(&bioc->refs, 1); bioc->fs_info = fs_info; - bioc->tgtdev_map = (u16 *)(bioc->stripes + total_stripes); - bioc->raid_map = (u64 *)(bioc->tgtdev_map + real_stripes); + bioc->raid_map = (u64 *)(bioc->stripes + total_stripes); + bioc->replace_stripe_src = -1; return bioc; } @@ -6188,93 +6185,75 @@ static void handle_ops_on_dev_replace(enum btrfs_map_op op, int *num_stripes_ret, int *max_errors_ret) { u64 srcdev_devid = dev_replace->srcdev->devid; - int tgtdev_indexes = 0; + /* + * At this stage, num_stripes is still the real number of stripes, + * excluding the duplicated stripes. + */ int num_stripes = *num_stripes_ret; + int nr_extra_stripes = 0; int max_errors = *max_errors_ret; int i; - if (op == BTRFS_MAP_WRITE) { - int index_where_to_add; + /* + * A block group which has "to_copy" set will eventually be copied by + * the dev-replace process. We can avoid cloning IO here. + */ + if (is_block_group_to_copy(dev_replace->srcdev->fs_info, logical)) + return; + + /* + * Duplicate the write operations while the dev-replace procedure is + * running. Since the copying of the old disk to the new disk takes + * place at run time while the filesystem is mounted writable, the + * regular write operations to the old disk have to be duplicated to go + * to the new disk as well. + * + * Note that device->missing is handled by the caller, and that the + * write to the old disk is already set up in the stripes array. + */ + for (i = 0; i < num_stripes; i++) { + struct btrfs_io_stripe *old = &bioc->stripes[i]; + struct btrfs_io_stripe *new = &bioc->stripes[num_stripes + nr_extra_stripes]; + + if (old->dev->devid != srcdev_devid) + continue; + + new->physical = old->physical; + new->dev = dev_replace->tgtdev; + if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) + bioc->replace_stripe_src = i; + nr_extra_stripes++; + } + + /* We can only have at most 2 extra nr_stripes (for DUP). */ + ASSERT(nr_extra_stripes <= 2); + /* + * For GET_READ_MIRRORS, we can only return at most 1 extra stripe for + * replace. + * If we have 2 extra stripes, only choose the one with smaller physical. + */ + if (op == BTRFS_MAP_GET_READ_MIRRORS && nr_extra_stripes == 2) { + struct btrfs_io_stripe *first = &bioc->stripes[num_stripes]; + struct btrfs_io_stripe *second = &bioc->stripes[num_stripes + 1]; + + /* Only DUP can have two extra stripes. */ + ASSERT(bioc->map_type & BTRFS_BLOCK_GROUP_DUP); /* - * A block group which have "to_copy" set will eventually - * copied by dev-replace process. We can avoid cloning IO here. + * Swap the last stripe stripes and just reduce + * @nr_extra_stripes. The extra stripe would still be there, + * but won't be accessed. */ - if (is_block_group_to_copy(dev_replace->srcdev->fs_info, logical)) - return; - - /* - * duplicate the write operations while the dev replace - * procedure is running. Since the copying of the old disk to - * the new disk takes place at run time while the filesystem is - * mounted writable, the regular write operations to the old - * disk have to be duplicated to go to the new disk as well. - * - * Note that device->missing is handled by the caller, and that - * the write to the old disk is already set up in the stripes - * array. - */ - index_where_to_add = num_stripes; - for (i = 0; i < num_stripes; i++) { - if (bioc->stripes[i].dev->devid == srcdev_devid) { - /* write to new disk, too */ - struct btrfs_io_stripe *new = - bioc->stripes + index_where_to_add; - struct btrfs_io_stripe *old = - bioc->stripes + i; - - new->physical = old->physical; - new->dev = dev_replace->tgtdev; - bioc->tgtdev_map[i] = index_where_to_add; - index_where_to_add++; - max_errors++; - tgtdev_indexes++; - } - } - num_stripes = index_where_to_add; - } else if (op == BTRFS_MAP_GET_READ_MIRRORS) { - int index_srcdev = 0; - int found = 0; - u64 physical_of_found = 0; - - /* - * During the dev-replace procedure, the target drive can also - * be used to read data in case it is needed to repair a corrupt - * block elsewhere. This is possible if the requested area is - * left of the left cursor. In this area, the target drive is a - * full copy of the source drive. - */ - for (i = 0; i < num_stripes; i++) { - if (bioc->stripes[i].dev->devid == srcdev_devid) { - /* - * In case of DUP, in order to keep it simple, - * only add the mirror with the lowest physical - * address - */ - if (found && - physical_of_found <= bioc->stripes[i].physical) - continue; - index_srcdev = i; - found = 1; - physical_of_found = bioc->stripes[i].physical; - } - } - if (found) { - struct btrfs_io_stripe *tgtdev_stripe = - bioc->stripes + num_stripes; - - tgtdev_stripe->physical = physical_of_found; - tgtdev_stripe->dev = dev_replace->tgtdev; - bioc->tgtdev_map[index_srcdev] = num_stripes; - - tgtdev_indexes++; - num_stripes++; + if (first->physical > second->physical) { + swap(second->physical, first->physical); + swap(second->dev, first->dev); + nr_extra_stripes--; } } - *num_stripes_ret = num_stripes; - *max_errors_ret = max_errors; - bioc->num_tgtdevs = tgtdev_indexes; + *num_stripes_ret = num_stripes + nr_extra_stripes; + *max_errors_ret = max_errors + nr_extra_stripes; + bioc->replace_nr_stripes = nr_extra_stripes; } static bool need_full_stripe(enum btrfs_map_op op) @@ -6361,7 +6340,6 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, int dev_replace_is_ongoing = 0; int patch_the_first_stripe_for_dev_replace = 0; u16 num_alloc_stripes; - u16 tgtdev_indexes = 0; u64 physical_to_patch_in_first_stripe = 0; u64 raid56_full_stripe_start = (u64)-1; u64 max_len; @@ -6506,13 +6484,16 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, } num_alloc_stripes = num_stripes; - if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL) { - if (op == BTRFS_MAP_WRITE) - num_alloc_stripes <<= 1; - if (op == BTRFS_MAP_GET_READ_MIRRORS) - num_alloc_stripes++; - tgtdev_indexes = num_stripes; - } + if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL && + op != BTRFS_MAP_READ) + /* + * For replace case, we need to add extra stripes for extra + * duplicated stripes. + * + * For both WRITE and GET_READ_MIRRORS, we may have at most + * 2 more stripes (DUP types, otherwise 1). + */ + num_alloc_stripes += 2; /* * If this I/O maps to a single device, try to return the device and @@ -6537,11 +6518,12 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, goto out; } - bioc = alloc_btrfs_io_context(fs_info, num_alloc_stripes, tgtdev_indexes); + bioc = alloc_btrfs_io_context(fs_info, num_alloc_stripes); if (!bioc) { ret = -ENOMEM; goto out; } + bioc->map_type = map->type; for (i = 0; i < num_stripes; i++) { set_io_stripe(&bioc->stripes[i], map, stripe_index, stripe_offset, @@ -6582,7 +6564,6 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, } *bioc_ret = bioc; - bioc->map_type = map->type; bioc->num_stripes = num_stripes; bioc->max_errors = max_errors; bioc->mirror_num = mirror_num; diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 34fe3a895301..1da65538c7e2 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -426,14 +426,13 @@ struct btrfs_io_context { /* * The following two members are for dev-replace case only. * - * @num_tgtdevs: Number of duplicated stripes which need to be + * @replace_nr_stripes: Number of duplicated stripes which need to be * written to replace target. * Should be <= 2 (2 for DUP, otherwise <= 1). - * @tgtdev_map: The array indicates where the duplicated stripes - * are from. The size is the number of original - * stripes (num_stripes - num_tgtdevs). + * @replace_stripe_src: The array indicates where the duplicated stripes + * are from. * - * The @tgtdev_map[] array is mostly for RAID56 cases. + * The @replace_stripe_src[] array is mostly for RAID56 cases. * As non-RAID56 stripes share the same contents of the mapped range, * thus no need to bother where the duplicated ones are from. * @@ -448,14 +447,17 @@ struct btrfs_io_context { * stripes[2]: dev = devid 3, physical = Z * stripes[3]: dev = devid 0, physical = Y * - * num_tgtdevs = 1 - * tgtdev_map[0] = 0 <- Means stripes[0] is not involved in replace. - * tgtdev_map[1] = 3 <- Means stripes[1] is involved in replace, - * and it's duplicated to stripes[3]. - * tgtdev_map[2] = 0 <- Means stripes[2] is not involved in replace. + * replace_nr_stripes = 1 + * replace_stripe_src = 1 <- Means stripes[1] is involved in replace. + * The duplicated stripe index would be + * (@num_stripes - 1). + * + * Note, that we can still have cases replace_nr_stripes = 2 for DUP. + * In that case, all stripes share the same content, thus we don't + * need to bother @replace_stripe_src value at all. */ - u16 num_tgtdevs; - u16 *tgtdev_map; + u16 replace_nr_stripes; + s16 replace_stripe_src; /* * logical block numbers for the start of each stripe * The last one or two are p/q. These are sorted, From patchwork Fri Feb 17 05:37:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13144335 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E232C636D4 for ; Fri, 17 Feb 2023 05:37:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229525AbjBQFhr (ORCPT ); Fri, 17 Feb 2023 00:37:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44210 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229445AbjBQFhq (ORCPT ); Fri, 17 Feb 2023 00:37:46 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4E96A5B771 for ; Thu, 16 Feb 2023 21:37:31 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id F1A3734028; Fri, 17 Feb 2023 05:37:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1676612250; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5YU+CznA1wPWPQVRWdMDHFqYnPgXBc72uG/I5f6NW+Q=; b=SN6gKuu5d2EJzvDj+ZtC7+huCjfcQ7PdTWZia2JyHxPr+tnV1LIayromO/vHokGRedJO31 fAl9A4c961ceCCmIlT7Y5wF7Qjgs2lwc0zsgpNcy+vt16YPkD0P43sq+gcDbmYkqN5PnQe LNvB+bbGrGuSI1eJhuMNMsFtA3bLZgM= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 1B6111323E; Fri, 17 Feb 2023 05:37:28 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id GB3vNZgS72PTSQAAMHmgww (envelope-from ); Fri, 17 Feb 2023 05:37:28 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: David Sterba Subject: [PATCH 6/6] btrfs: replace btrfs_io_context::raid_map with a fixed u64 value Date: Fri, 17 Feb 2023 13:37:03 +0800 Message-Id: <43c7dfc22297d3bc8e564e10695a403bed7f18c2.1676611536.git.wqu@suse.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org In btrfs_io_context structure, we have a pointer raid_map, which is to indicate the logical bytenr for each stripe. But considering we always call sort_parity_stripes(), the result raid_map[] is always sorted, thus raid_map[0] is always the logical bytenr of the full stripe. So why we waste the space and time (for sorting) for raid_map? This patch will replace btrfs_io_context::raid_map with a single u64 number, full_stripe_start, by: - Replace btrfs_io_context::raid_map with full_stripe_start - Replace call sites using raid_map[0] to use full_stripe_start - Replace call sites using raid_map[i] to compare with nr_data_stripes. The benefits are: - Less memory wasted on raid_map It's sizeof(u64) * num_stripes vs size(u64). It's always a save of at least one u64, and the benefit grows larger with num_stripes. - No more weird alloc_btrfs_io_context() behavior As there is only one fixed size + one variable length array. Signed-off-by: Qu Wenruo Signed-off-by: David Sterba --- fs/btrfs/raid56.c | 31 +++++++------ fs/btrfs/scrub.c | 26 +++++------ fs/btrfs/volumes.c | 84 ++++++++++++++---------------------- fs/btrfs/volumes.h | 19 ++++++-- include/trace/events/btrfs.h | 2 +- 5 files changed, 78 insertions(+), 84 deletions(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index e495ac147822..e7eccadca5f0 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -202,7 +202,7 @@ static void cache_rbio_pages(struct btrfs_raid_bio *rbio) */ static int rbio_bucket(struct btrfs_raid_bio *rbio) { - u64 num = rbio->bioc->raid_map[0]; + u64 num = rbio->bioc->full_stripe_logical; /* * we shift down quite a bit. We're using byte @@ -571,7 +571,7 @@ static int rbio_can_merge(struct btrfs_raid_bio *last, test_bit(RBIO_CACHE_BIT, &cur->flags)) return 0; - if (last->bioc->raid_map[0] != cur->bioc->raid_map[0]) + if (last->bioc->full_stripe_logical != cur->bioc->full_stripe_logical) return 0; /* we can't merge with different operations */ @@ -666,7 +666,7 @@ static noinline int lock_stripe_add(struct btrfs_raid_bio *rbio) spin_lock_irqsave(&h->lock, flags); list_for_each_entry(cur, &h->hash_list, hash_list) { - if (cur->bioc->raid_map[0] != rbio->bioc->raid_map[0]) + if (cur->bioc->full_stripe_logical != rbio->bioc->full_stripe_logical) continue; spin_lock(&cur->bio_list_lock); @@ -1119,7 +1119,7 @@ static void index_one_bio(struct btrfs_raid_bio *rbio, struct bio *bio) struct bio_vec bvec; struct bvec_iter iter; u32 offset = (bio->bi_iter.bi_sector << SECTOR_SHIFT) - - rbio->bioc->raid_map[0]; + rbio->bioc->full_stripe_logical; bio_for_each_segment(bvec, bio, iter) { u32 bvec_offset; @@ -1343,7 +1343,7 @@ static void set_rbio_range_error(struct btrfs_raid_bio *rbio, struct bio *bio) { struct btrfs_fs_info *fs_info = rbio->bioc->fs_info; u32 offset = (bio->bi_iter.bi_sector << SECTOR_SHIFT) - - rbio->bioc->raid_map[0]; + rbio->bioc->full_stripe_logical; int total_nr_sector = offset >> fs_info->sectorsize_bits; ASSERT(total_nr_sector < rbio->nr_data * rbio->stripe_nsectors); @@ -1620,7 +1620,7 @@ static void rbio_add_bio(struct btrfs_raid_bio *rbio, struct bio *orig_bio) { const struct btrfs_fs_info *fs_info = rbio->bioc->fs_info; const u64 orig_logical = orig_bio->bi_iter.bi_sector << SECTOR_SHIFT; - const u64 full_stripe_start = rbio->bioc->raid_map[0]; + const u64 full_stripe_start = rbio->bioc->full_stripe_logical; const u32 orig_len = orig_bio->bi_iter.bi_size; const u32 sectorsize = fs_info->sectorsize; u64 cur_logical; @@ -1807,9 +1807,8 @@ static int recover_vertical(struct btrfs_raid_bio *rbio, int sector_nr, * here due to a crc mismatch and we can't give them the * data they want. */ - if (rbio->bioc->raid_map[failb] == RAID6_Q_STRIPE) { - if (rbio->bioc->raid_map[faila] == - RAID5_P_STRIPE) + if (failb == rbio->real_stripes - 1) { + if (faila == rbio->real_stripes - 2) /* * Only P and Q are corrupted. * We only care about data stripes recovery, @@ -1823,7 +1822,7 @@ static int recover_vertical(struct btrfs_raid_bio *rbio, int sector_nr, goto pstripe; } - if (rbio->bioc->raid_map[failb] == RAID5_P_STRIPE) { + if (failb == rbio->real_stripes - 2) { raid6_datap_recov(rbio->real_stripes, sectorsize, faila, pointers); } else { @@ -2086,8 +2085,8 @@ static void fill_data_csums(struct btrfs_raid_bio *rbio) { struct btrfs_fs_info *fs_info = rbio->bioc->fs_info; struct btrfs_root *csum_root = btrfs_csum_root(fs_info, - rbio->bioc->raid_map[0]); - const u64 start = rbio->bioc->raid_map[0]; + rbio->bioc->full_stripe_logical); + const u64 start = rbio->bioc->full_stripe_logical; const u32 len = (rbio->nr_data * rbio->stripe_nsectors) << fs_info->sectorsize_bits; int ret; @@ -2135,7 +2134,7 @@ static void fill_data_csums(struct btrfs_raid_bio *rbio) */ btrfs_warn_rl(fs_info, "sub-stripe write for full stripe %llu is not safe, failed to get csum: %d", - rbio->bioc->raid_map[0], ret); + rbio->bioc->full_stripe_logical, ret); no_csum: kfree(rbio->csum_buf); bitmap_free(rbio->csum_bitmap); @@ -2391,10 +2390,10 @@ void raid56_add_scrub_pages(struct btrfs_raid_bio *rbio, struct page *page, int stripe_offset; int index; - ASSERT(logical >= rbio->bioc->raid_map[0]); - ASSERT(logical + sectorsize <= rbio->bioc->raid_map[0] + + ASSERT(logical >= rbio->bioc->full_stripe_logical); + ASSERT(logical + sectorsize <= rbio->bioc->full_stripe_logical + BTRFS_STRIPE_LEN * rbio->nr_data); - stripe_offset = (int)(logical - rbio->bioc->raid_map[0]); + stripe_offset = (int)(logical - rbio->bioc->full_stripe_logical); index = stripe_offset / sectorsize; rbio->bio_sectors[index].page = page; rbio->bio_sectors[index].pgoff = pgoff; diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 83de9fecc7b6..ee3fe6c291fe 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -1430,7 +1430,7 @@ static inline int scrub_nr_raid_mirrors(struct btrfs_io_context *bioc) } static inline void scrub_stripe_index_and_offset(u64 logical, u64 map_type, - u64 *raid_map, + u64 full_stripe_logical, int nstripes, int mirror, int *stripe_index, u64 *stripe_offset) @@ -1438,19 +1438,21 @@ static inline void scrub_stripe_index_and_offset(u64 logical, u64 map_type, int i; if (map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) { - /* RAID5/6 */ - for (i = 0; i < nstripes; i++) { - if (raid_map[i] == RAID6_Q_STRIPE || - raid_map[i] == RAID5_P_STRIPE) - continue; + const int nr_data_stripes = (map_type & BTRFS_BLOCK_GROUP_RAID5) ? + nstripes - 1 : nstripes - 2; - if (logical >= raid_map[i] && - logical < raid_map[i] + BTRFS_STRIPE_LEN) + /* RAID5/6 */ + for (i = 0; i < nr_data_stripes; i++) { + const u64 data_stripe_start = full_stripe_logical + + (i * BTRFS_STRIPE_LEN); + + if (logical >= data_stripe_start && + logical < data_stripe_start + BTRFS_STRIPE_LEN) break; } *stripe_index = i; - *stripe_offset = logical - raid_map[i]; + *stripe_offset = logical - full_stripe_logical; } else { /* The other RAID type */ *stripe_index = mirror; @@ -1538,7 +1540,7 @@ static int scrub_setup_recheck_block(struct scrub_block *original_sblock, scrub_stripe_index_and_offset(logical, bioc->map_type, - bioc->raid_map, + bioc->full_stripe_logical, bioc->num_stripes - bioc->replace_nr_stripes, mirror_index, @@ -2398,7 +2400,7 @@ static void scrub_missing_raid56_pages(struct scrub_block *sblock) btrfs_bio_counter_inc_blocked(fs_info); ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS, logical, &length, &bioc); - if (ret || !bioc || !bioc->raid_map) + if (ret || !bioc) goto bioc_out; if (WARN_ON(!sctx->is_dev_replace || @@ -3008,7 +3010,7 @@ static void scrub_parity_check_and_repair(struct scrub_parity *sparity) btrfs_bio_counter_inc_blocked(fs_info); ret = btrfs_map_sblock(fs_info, BTRFS_MAP_WRITE, sparity->logic_start, &length, &bioc); - if (ret || !bioc || !bioc->raid_map) + if (ret || !bioc) goto bioc_out; bio = bio_alloc(NULL, BIO_MAX_VECS, REQ_OP_READ, GFP_NOFS); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 31a25281c7d8..d528e6aca8ee 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5878,25 +5878,6 @@ static int find_live_mirror(struct btrfs_fs_info *fs_info, return preferred_mirror; } -/* Bubble-sort the stripe set to put the parity/syndrome stripes last */ -static void sort_parity_stripes(struct btrfs_io_context *bioc, int num_stripes) -{ - int i; - int again = 1; - - while (again) { - again = 0; - for (i = 0; i < num_stripes - 1; i++) { - /* Swap if parity is on a smaller index */ - if (bioc->raid_map[i] > bioc->raid_map[i + 1]) { - swap(bioc->stripes[i], bioc->stripes[i + 1]); - swap(bioc->raid_map[i], bioc->raid_map[i + 1]); - again = 1; - } - } - } -} - static struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_info, u16 total_stripes) { @@ -5906,12 +5887,7 @@ static struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_ /* The size of btrfs_io_context */ sizeof(struct btrfs_io_context) + /* Plus the variable array for the stripes */ - sizeof(struct btrfs_io_stripe) * (total_stripes) + - /* - * Plus the raid_map, which includes both the tgt dev - * and the stripes. - */ - sizeof(u64) * (total_stripes), + sizeof(struct btrfs_io_stripe) * (total_stripes), GFP_NOFS); if (!bioc) @@ -5920,8 +5896,8 @@ static struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_ refcount_set(&bioc->refs, 1); bioc->fs_info = fs_info; - bioc->raid_map = (u64 *)(bioc->stripes + total_stripes); bioc->replace_stripe_src = -1; + bioc->full_stripe_logical = (u64)-1; return bioc; } @@ -6525,33 +6501,39 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, } bioc->map_type = map->type; - for (i = 0; i < num_stripes; i++) { - set_io_stripe(&bioc->stripes[i], map, stripe_index, stripe_offset, - stripe_nr); - stripe_index++; - } - - /* Build raid_map */ + /* + * For RAID56 full map, we need to make sure the stripes[] follows the + * rule that data stripes are all ordered, then followed with P and Q + * (if we have). + * + * It's still mostly the same as other profiles, just with extra rotation. + */ if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK && need_raid_map && (need_full_stripe(op) || mirror_num > 1)) { - u64 tmp; - unsigned rot; - - /* Work out the disk rotation on this stripe-set */ - rot = stripe_nr % num_stripes; - - /* Fill in the logical address of each stripe */ - tmp = stripe_nr * data_stripes; - for (i = 0; i < data_stripes; i++) - bioc->raid_map[(i + rot) % num_stripes] = - em->start + ((tmp + i) << BTRFS_STRIPE_LEN_SHIFT); - - bioc->raid_map[(i + rot) % map->num_stripes] = RAID5_P_STRIPE; - if (map->type & BTRFS_BLOCK_GROUP_RAID6) - bioc->raid_map[(i + rot + 1) % num_stripes] = - RAID6_Q_STRIPE; - - sort_parity_stripes(bioc, num_stripes); + /* + * For RAID56 @stripe_nr is already the number of full stripes + * before us, which is also the rotation value (needs to modulo + * with num_stripes). + * + * In this case, we just add @stripe_nr with @i, then do the + * modulo, to reduce one modulo call. + */ + bioc->full_stripe_logical = em->start + + ((stripe_nr * data_stripes) << BTRFS_STRIPE_LEN_SHIFT); + for (i = 0; i < num_stripes; i++) + set_io_stripe(&bioc->stripes[i], map, + (i + stripe_nr) % num_stripes, + stripe_offset, stripe_nr); + } else { + /* + * For all other non-RAID56 profiles, just copy the target + * stripe into the bioc. + */ + for (i = 0; i < num_stripes; i++) { + set_io_stripe(&bioc->stripes[i], map, stripe_index, + stripe_offset, stripe_nr); + stripe_index++; + } } if (need_full_stripe(op)) diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 1da65538c7e2..6296cd9a72b2 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -459,11 +459,22 @@ struct btrfs_io_context { u16 replace_nr_stripes; s16 replace_stripe_src; /* - * logical block numbers for the start of each stripe - * The last one or two are p/q. These are sorted, - * so raid_map[0] is the start of our full stripe + * Logical bytenr of the full stripe start, only for RAID56 cases. + * + * When this value is set to other than (u64)-1, the stripes[] should + * follow this pattern: + * + * (real_stripes = num_stripes - replace_nr_stripes) + * (data_stripes = (is_raid6) ? (real_stripes - 2) : (real_stripes - 1)) + * + * stripes[0]: The first data stripe + * stripes[1]: The second data stripe + * ... + * stripes[data_stripes - 1]: The last data stripe + * stripes[data_stripes]: The P stripe + * stripes[data_stripes + 1]: The Q stripe (only for RAID6). */ - u64 *raid_map; + u64 full_stripe_logical; struct btrfs_io_stripe stripes[]; }; diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index 75d7d22c3a27..8ea9cea9bfeb 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -2422,7 +2422,7 @@ DECLARE_EVENT_CLASS(btrfs_raid56_bio, ), TP_fast_assign_btrfs(rbio->bioc->fs_info, - __entry->full_stripe = rbio->bioc->raid_map[0]; + __entry->full_stripe = rbio->bioc->full_stripe_logical; __entry->physical = bio->bi_iter.bi_sector << SECTOR_SHIFT; __entry->len = bio->bi_iter.bi_size; __entry->opf = bio_op(bio);