From patchwork Mon Jul 3 07:32:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13299660 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C73F4C0015E for ; Mon, 3 Jul 2023 07:33:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230461AbjGCHdP (ORCPT ); Mon, 3 Jul 2023 03:33:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35454 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230454AbjGCHdM (ORCPT ); Mon, 3 Jul 2023 03:33:12 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BFE4CE49 for ; Mon, 3 Jul 2023 00:32:59 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 81699218FC for ; Mon, 3 Jul 2023 07:32:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1688369578; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bOvnv9Nbt7nFXdTtiHEAsDfSfiOyaLD2cerBH7QX9BQ=; b=DNwxcePOGVO6sZA4ZFkdPXAf61n27tj5GzFkEJE2CefrauvDjrlhdeQ8d4mrIRw+mfJxsH A5ITDRseuCRNsuZTkN3Gax/WZ6Ll5QCyOi7Za2PtNhicCRYohL0Z6qB7CguewU5UQbJ7sa viOe5OwXjpHvhmKusVPrkq4UM51avxw= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id D2AC813276 for ; Mon, 3 Jul 2023 07:32:57 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id GFowJql5omRoVQAAMHmgww (envelope-from ) for ; Mon, 03 Jul 2023 07:32:57 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 01/14] btrfs: raid56: remove unnecessary parameter for raid56_parity_alloc_scrub_rbio() Date: Mon, 3 Jul 2023 15:32:25 +0800 Message-ID: <1f848dbf2089ca42a3c17e2f361e7aabe3e91507.1688368617.git.wqu@suse.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The parameter @stripe_nsectors is easily calculated using BTRFS_STRIPE_LEN >> fs_info->sectorsize_bits, and is already cached in rbio->stripe_nsectors. Thus we don't need to pass that parameter from the caller, and can remove it safely. Signed-off-by: Qu Wenruo Reviewed-by: Anand Jain Reviewed-by: Christoph Hellwig --- fs/btrfs/raid56.c | 4 ++-- fs/btrfs/raid56.h | 2 +- fs/btrfs/scrub.c | 3 +-- 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 7d1cb4dd0f2c..cc783da065b4 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -2335,7 +2335,7 @@ static void rmw_rbio_work_locked(struct work_struct *work) struct btrfs_raid_bio *raid56_parity_alloc_scrub_rbio(struct bio *bio, struct btrfs_io_context *bioc, struct btrfs_device *scrub_dev, - unsigned long *dbitmap, int stripe_nsectors) + unsigned long *dbitmap) { struct btrfs_fs_info *fs_info = bioc->fs_info; struct btrfs_raid_bio *rbio; @@ -2365,7 +2365,7 @@ struct btrfs_raid_bio *raid56_parity_alloc_scrub_rbio(struct bio *bio, } ASSERT(i < rbio->real_stripes); - bitmap_copy(&rbio->dbitmap, dbitmap, stripe_nsectors); + bitmap_copy(&rbio->dbitmap, dbitmap, rbio->stripe_nsectors); return rbio; } diff --git a/fs/btrfs/raid56.h b/fs/btrfs/raid56.h index 45e6ff78316f..1871a2a16a1c 100644 --- a/fs/btrfs/raid56.h +++ b/fs/btrfs/raid56.h @@ -189,7 +189,7 @@ void raid56_parity_write(struct bio *bio, struct btrfs_io_context *bioc); struct btrfs_raid_bio *raid56_parity_alloc_scrub_rbio(struct bio *bio, struct btrfs_io_context *bioc, struct btrfs_device *scrub_dev, - unsigned long *dbitmap, int stripe_nsectors); + unsigned long *dbitmap); void raid56_parity_submit_scrub_rbio(struct btrfs_raid_bio *rbio); void raid56_parity_cache_data_pages(struct btrfs_raid_bio *rbio, diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 38c103f13fd5..5a0626a3c23c 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -1914,8 +1914,7 @@ static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx, btrfs_bio_counter_dec(fs_info); goto out; } - rbio = raid56_parity_alloc_scrub_rbio(bio, bioc, scrub_dev, &extent_bitmap, - BTRFS_STRIPE_LEN >> fs_info->sectorsize_bits); + rbio = raid56_parity_alloc_scrub_rbio(bio, bioc, scrub_dev, &extent_bitmap); btrfs_put_bioc(bioc); if (!rbio) { ret = -ENOMEM; From patchwork Mon Jul 3 07:32:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13299661 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA420EB64DD for ; Mon, 3 Jul 2023 07:33:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230468AbjGCHdP (ORCPT ); Mon, 3 Jul 2023 03:33:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35470 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229986AbjGCHdO (ORCPT ); Mon, 3 Jul 2023 03:33:14 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E184FE5D for ; Mon, 3 Jul 2023 00:33:00 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 982BB1F8B4 for ; Mon, 3 Jul 2023 07:32:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1688369579; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y1GHaYZFWXJD/qRSi9l2fwrNwMSQrKAsf8uX196ZxxE=; b=UdtKrJVubbv/vPGajNkx0shFpjRSGKo9PFc6/CljLgPTr4mhw3WPaXexeLEZaHJYSerQho NPzNwAcCFGBbNyyTfC38iTjHDwe6HT2joG9LwPC+f05/l8zKuddqm/fsvIInBXjODmQO9j EadroRrnDQMoUyVIOMBsTnuEla+vd7Y= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id E3D4713276 for ; Mon, 3 Jul 2023 07:32:58 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id kJoEK6p5omRoVQAAMHmgww (envelope-from ) for ; Mon, 03 Jul 2023 07:32:58 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 02/14] btrfs: raid56: allow scrub operation to update both P and Q stripes Date: Mon, 3 Jul 2023 15:32:26 +0800 Message-ID: X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Current the scrub_rbio() workload (used by raid56_parity_alloc_scrub_rbio() then raid56_parity_submit_scrub_rbio()) can only handle one P/Q stripe workload. But inside finish_parity_scrub() we always do the P/Q generation and verification for both P/Q stripes. For the incoming scrub_logical feature, we want to verify P/Q stripes in one go, so here we introduce the ability to scrub both P/Q stripes. To do that the following things are modified: - raid56_parity_alloc_scrub_rbio() to allows empty @scrub_dev Thankfully we don't have any extra sanity checks to reject such empty parameter. - Verify both P and Q stripes if @rbio->scrubp is 0 Now we do the verification in a loop, and skip the stripe if rbio->scrubp is not zero and does not match the stripe number. - Add bad sectors of both P/Q stripes to writeback Unfortunately we're using the same dbitmap, this means if only one of the P/Q stripes is corrupted, we would write both. However we can accept the slightly extra cost for it. Signed-off-by: Qu Wenruo Reviewed-by: Christoph Hellwig --- fs/btrfs/raid56.c | 56 +++++++++++++++++++++++++++++++++++------------ 1 file changed, 42 insertions(+), 14 deletions(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index cc783da065b4..d96bfc3a16fc 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -2363,7 +2363,14 @@ struct btrfs_raid_bio *raid56_parity_alloc_scrub_rbio(struct bio *bio, break; } } - ASSERT(i < rbio->real_stripes); + /* + * If @scrub_dev is specified, @scrubp must be the stripe number of + * P or Q stripe. + */ + if (scrub_dev) { + ASSERT(rbio->scrubp >= rbio->nr_data); + ASSERT(rbio->scrubp < rbio->real_stripes); + } bitmap_copy(&rbio->dbitmap, dbitmap, rbio->stripe_nsectors); return rbio; @@ -2426,7 +2433,8 @@ static int finish_parity_scrub(struct btrfs_raid_bio *rbio) * Replace is running and our P/Q stripe is being replaced, then we * need to duplicate the final write to replace target. */ - if (bioc->replace_nr_stripes && bioc->replace_stripe_src == rbio->scrubp) { + if (rbio->scrubp && bioc->replace_nr_stripes && + bioc->replace_stripe_src == rbio->scrubp) { is_replace = 1; bitmap_copy(pbitmap, &rbio->dbitmap, rbio->stripe_nsectors); } @@ -2464,6 +2472,7 @@ static int finish_parity_scrub(struct btrfs_raid_bio *rbio) for_each_set_bit(sectornr, &rbio->dbitmap, rbio->stripe_nsectors) { struct sector_ptr *sector; + bool found_error = false; void *parity; /* first collect one page from each data stripe */ @@ -2484,14 +2493,22 @@ static int finish_parity_scrub(struct btrfs_raid_bio *rbio) } /* Check scrubbing parity and repair it */ - sector = rbio_stripe_sector(rbio, rbio->scrubp, sectornr); - parity = kmap_local_page(sector->page) + sector->pgoff; - if (memcmp(parity, pointers[rbio->scrubp], sectorsize) != 0) - memcpy(parity, pointers[rbio->scrubp], sectorsize); - else + for (int i = rbio->nr_data; i < rbio->real_stripes; i++) { + /* Skip this stripe if it's not our scrub target. */ + if (rbio->scrubp && i != rbio->scrubp) + continue; + + sector = rbio_stripe_sector(rbio, i, sectornr); + parity = kmap_local_page(sector->page) + sector->pgoff; + if (memcmp(parity, pointers[i], sectorsize) != 0) { + memcpy(parity, pointers[i], sectorsize); + found_error = true; + } + kunmap_local(parity); + } + if (!found_error) /* Parity is right, needn't writeback */ bitmap_clear(&rbio->dbitmap, sectornr, 1); - kunmap_local(parity); for (stripe = nr_data - 1; stripe >= 0; stripe--) kunmap_local(pointers[stripe]); @@ -2514,11 +2531,16 @@ static int finish_parity_scrub(struct btrfs_raid_bio *rbio) for_each_set_bit(sectornr, &rbio->dbitmap, rbio->stripe_nsectors) { struct sector_ptr *sector; - sector = rbio_stripe_sector(rbio, rbio->scrubp, sectornr); - ret = rbio_add_io_sector(rbio, &bio_list, sector, rbio->scrubp, - sectornr, REQ_OP_WRITE); - if (ret) - goto cleanup; + for (int i = rbio->nr_data; i < rbio->real_stripes; i++) { + /* Skip this stripe if it's not our scrub target. */ + if (rbio->scrubp && i != rbio->scrubp) + continue; + sector = rbio_stripe_sector(rbio, i, sectornr); + ret = rbio_add_io_sector(rbio, &bio_list, sector, i, + sectornr, REQ_OP_WRITE); + if (ret) + goto cleanup; + } } if (!is_replace) @@ -2529,6 +2551,12 @@ static int finish_parity_scrub(struct btrfs_raid_bio *rbio) * the target device. Check we have a valid source stripe number. */ ASSERT(rbio->bioc->replace_stripe_src >= 0); + /* + * For dev-replace case, scrubp must be provided by the caller at + * raid56_parity_alloc_scrub_rbio(). + */ + ASSERT(rbio->scrubp >= rbio->nr_data); + ASSERT(rbio->scrubp < rbio->real_stripes); for_each_set_bit(sectornr, pbitmap, rbio->stripe_nsectors) { struct sector_ptr *sector; @@ -2625,7 +2653,7 @@ static int recover_scrub_rbio(struct btrfs_raid_bio *rbio) * scrubbing parity, luckily, use the other one to repair the * data, or we can not repair the data stripe. */ - if (failp != rbio->scrubp) { + if (rbio->scrubp && failp != rbio->scrubp) { ret = -EIO; goto out; } From patchwork Mon Jul 3 07:32:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13299663 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3B25C0015E for ; Mon, 3 Jul 2023 07:33:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230328AbjGCHdU (ORCPT ); Mon, 3 Jul 2023 03:33:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35486 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230454AbjGCHdP (ORCPT ); Mon, 3 Jul 2023 03:33:15 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 190D1E64 for ; Mon, 3 Jul 2023 00:33:02 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id B3E9A1F8AC for ; Mon, 3 Jul 2023 07:33:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1688369580; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zRX1CFK6Lwh8rCfC0lbbKXIi/8mlmCJ2MZKyvM6jBxI=; b=h0LcnMtQI5BPXr/82Vxsku4QDATeTrgQzuv7xWJsNE76Cm2I3PwOet3MIXTytHnJ5IdJ/a 4uo5bUuaaJciy0yTK5ilkfXl82DTvD5dU8f34HPdAS67FtxgOYBDrOAueUya9wZx/3RkF7 nRV81EVW/jg7QpFdId6BD7LJIrO151U= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 0A6AE13276 for ; Mon, 3 Jul 2023 07:32:59 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id sAXxMKt5omRoVQAAMHmgww (envelope-from ) for ; Mon, 03 Jul 2023 07:32:59 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 03/14] btrfs: raid56: allow caching P/Q stripes Date: Mon, 3 Jul 2023 15:32:27 +0800 Message-ID: <04691a8119ab8612a3f3bc4b003f44dcb189be86.1688368617.git.wqu@suse.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently raid56_parity_cache_scrub_pages() only allows to cache data stripes, as the only scrub call site is only going to scrub P/Q stripes. But later we want to use cached pages to do recovery, thus we need to cache P/Q stripes. This patch would do the following changes to allow such new ability by: - Rename the function to raid56_parity_cache_pages() - Use @stripe_num to indicate where the cached stripe is - Change the ASSERT() to allow P/Q stripes to be cached - Introduce a new member, btrfs_raid_bio::cached, to allow recovery path to use cached pages Signed-off-by: Qu Wenruo --- fs/btrfs/raid56.c | 43 ++++++++++++++++++++++++++++++------------- fs/btrfs/raid56.h | 12 +++++++++--- fs/btrfs/scrub.c | 3 +-- 3 files changed, 40 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index d96bfc3a16fc..b2eb7e60a137 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -1932,11 +1932,10 @@ static void recover_rbio(struct btrfs_raid_bio *rbio) /* * Read everything that hasn't failed. However this time we will - * not trust any cached sector. + * not trust any cached sector, unless it's explicitly required. + * * As we may read out some stale data but higher layer is not reading * that stale part. - * - * So here we always re-read everything in recovery path. */ for (total_sector_nr = 0; total_sector_nr < rbio->nr_sectors; total_sector_nr++) { @@ -1960,6 +1959,14 @@ static void recover_rbio(struct btrfs_raid_bio *rbio) } sector = rbio_stripe_sector(rbio, stripe, sectornr); + + /* + * If we're forced to use cache and the sector is already cached, + * then we can skip this sector. + * Otherwise we should still read from disk. + */ + if (rbio->cached && sector->uptodate) + continue; ret = rbio_add_io_sector(rbio, &bio_list, sector, stripe, sectornr, REQ_OP_READ); if (ret < 0) { @@ -2765,22 +2772,32 @@ void raid56_parity_submit_scrub_rbio(struct btrfs_raid_bio *rbio) } /* - * This is for scrub call sites where we already have correct data contents. - * This allows us to avoid reading data stripes again. + * This is for scrub call sites where we already have correct stripe contents. + * This allows us to avoid reading on-disk stripes again. * * Unfortunately here we have to do page copy, other than reusing the pages. * This is due to the fact rbio has its own page management for its cache. + * But this is also a good thing for recovery tries, this prevents the recovery + * path to modify the stripes until we've verified the recovered data. + * + * @rbio: The allocated rbio by raid56_parity_alloc_*_rbio() + * @pages: The pages which contains the stripe contents. + * @stripe_num: The stripe number, 0 means the first data stripe, and + * @rbio->nr_data means the P stripe. */ -void raid56_parity_cache_data_pages(struct btrfs_raid_bio *rbio, - struct page **data_pages, u64 data_logical) +void raid56_parity_cache_pages(struct btrfs_raid_bio *rbio, struct page **pages, + int stripe_num) { - const u64 offset_in_full_stripe = data_logical - - rbio->bioc->full_stripe_logical; + const u64 offset_in_full_stripe = btrfs_stripe_nr_to_offset(stripe_num); const int page_index = offset_in_full_stripe >> PAGE_SHIFT; const u32 sectorsize = rbio->bioc->fs_info->sectorsize; const u32 sectors_per_page = PAGE_SIZE / sectorsize; int ret; + if (stripe_num >= rbio->nr_data) + ret = alloc_rbio_parity_pages(rbio); + else + ret = alloc_rbio_data_pages(rbio); /* * If we hit ENOMEM temporarily, but later at * raid56_parity_submit_scrub_rbio() time it succeeded, we just do @@ -2789,17 +2806,16 @@ void raid56_parity_cache_data_pages(struct btrfs_raid_bio *rbio, * If we hit ENOMEM later at raid56_parity_submit_scrub_rbio() time, * the bio would got proper error number set. */ - ret = alloc_rbio_data_pages(rbio); if (ret < 0) return; - /* data_logical must be at stripe boundary and inside the full stripe. */ + /* The stripe must be at stripe boundary and inside the full stripe. */ ASSERT(IS_ALIGNED(offset_in_full_stripe, BTRFS_STRIPE_LEN)); - ASSERT(offset_in_full_stripe < btrfs_stripe_nr_to_offset(rbio->nr_data)); + ASSERT(offset_in_full_stripe < btrfs_stripe_nr_to_offset(rbio->real_stripes)); for (int page_nr = 0; page_nr < (BTRFS_STRIPE_LEN >> PAGE_SHIFT); page_nr++) { struct page *dst = rbio->stripe_pages[page_nr + page_index]; - struct page *src = data_pages[page_nr]; + struct page *src = pages[page_nr]; memcpy_page(dst, 0, src, 0, PAGE_SIZE); for (int sector_nr = sectors_per_page * page_index; @@ -2807,4 +2823,5 @@ void raid56_parity_cache_data_pages(struct btrfs_raid_bio *rbio, sector_nr++) rbio->stripe_sectors[sector_nr].uptodate = true; } + rbio->cached = true; } diff --git a/fs/btrfs/raid56.h b/fs/btrfs/raid56.h index 1871a2a16a1c..41354a9f158a 100644 --- a/fs/btrfs/raid56.h +++ b/fs/btrfs/raid56.h @@ -82,6 +82,13 @@ struct btrfs_raid_bio { */ int bio_list_bytes; + /* + * If this rbio is forced to use cached stripes provided by the caller. + * + * Used by scrub path to reduce IO. + */ + bool cached; + refcount_t refs; atomic_t stripes_pending; @@ -191,9 +198,8 @@ struct btrfs_raid_bio *raid56_parity_alloc_scrub_rbio(struct bio *bio, struct btrfs_device *scrub_dev, unsigned long *dbitmap); void raid56_parity_submit_scrub_rbio(struct btrfs_raid_bio *rbio); - -void raid56_parity_cache_data_pages(struct btrfs_raid_bio *rbio, - struct page **data_pages, u64 data_logical); +void raid56_parity_cache_pages(struct btrfs_raid_bio *rbio, struct page **pages, + int stripe_num); int btrfs_alloc_stripe_hash_table(struct btrfs_fs_info *info); void btrfs_free_stripe_hash_table(struct btrfs_fs_info *info); diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 5a0626a3c23c..1864856abb13 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -1925,8 +1925,7 @@ static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx, for (int i = 0; i < data_stripes; i++) { stripe = &sctx->raid56_data_stripes[i]; - raid56_parity_cache_data_pages(rbio, stripe->pages, - full_stripe_start + btrfs_stripe_nr_to_offset(i)); + raid56_parity_cache_pages(rbio, stripe->pages, i); } raid56_parity_submit_scrub_rbio(rbio); wait_for_completion_io(&io_done); From patchwork Mon Jul 3 07:32:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13299662 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20E56EB64DC for ; Mon, 3 Jul 2023 07:33:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229986AbjGCHdT (ORCPT ); Mon, 3 Jul 2023 03:33:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35470 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230464AbjGCHdP (ORCPT ); Mon, 3 Jul 2023 03:33:15 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 82A2FE6A for ; Mon, 3 Jul 2023 00:33:03 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 45590218F9 for ; Mon, 3 Jul 2023 07:33:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1688369582; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=x2Gn273D3eJJXiPy8q7PfiNedvrhXMd8qPQSyE8veiE=; b=CRotMnnjoF6tILG0z7z016LMS+3xnBLj//B/ub7gzSisROZ8jrkB3TPTq6yX7Mdjzis2tl ZI2T5aLxZEps0geGUCZWQl5ItWwq1maAbltrtFn+5wq/YqVpEFDNB2QbwHGuKvXYFWJ0Vk PiozJ44KOCjyqnR24YE3jxvjq98/Zy0= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 2A03113276 for ; Mon, 3 Jul 2023 07:33:00 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id WL50OKx5omRoVQAAMHmgww (envelope-from ) for ; Mon, 03 Jul 2023 07:33:00 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 04/14] btrfs: raid56: add the interfaces to submit recovery rbio Date: Mon, 3 Jul 2023 15:32:28 +0800 Message-ID: <33215d978e2d8423aedde0c79d733204010c2adb.1688368617.git.wqu@suse.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The incoming scrub_logical would need to recover raid56 data sectors with cached pages. This means we can not go regular btrfs_submit_bio() path, but go a similar path like raid56_parity_alloc_scrub_rbio(). So this patch adds the following new functions to allow recover rbio to be allocated and submitted out of the btrfs_submit_bio() path: - raid56_parity_alloc_recover_rbio() - raid56_parity_submit_scrub_rbio() This means now we can go a full cached recover without reading any pages from disk. Signed-off-by: Qu Wenruo --- fs/btrfs/raid56.c | 31 +++++++++++++++++++++++++++++++ fs/btrfs/raid56.h | 3 +++ 2 files changed, 34 insertions(+) diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index b2eb7e60a137..0340220463c6 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -2383,6 +2383,31 @@ struct btrfs_raid_bio *raid56_parity_alloc_scrub_rbio(struct bio *bio, return rbio; } +/* + * Alloc a recovery rbio out of the regular btrfs_submit_bio() path. + * + * This allows scrub caller to use cached pages to reduce IO. + */ +struct btrfs_raid_bio *raid56_parity_alloc_recover_rbio(struct bio *bio, + struct btrfs_io_context *bioc, int mirror_num) +{ + struct btrfs_fs_info *fs_info = bioc->fs_info; + struct btrfs_raid_bio *rbio; + + rbio = alloc_rbio(fs_info, bioc); + if (IS_ERR(rbio)) + return NULL; + /* We should have some sectors that really need to be recovered. */ + ASSERT(bio->bi_iter.bi_size); + bio_list_add(&rbio->bio_list, bio); + set_rbio_range_error(rbio, bio); + rbio->operation = BTRFS_RBIO_READ_REBUILD; + if (mirror_num > 2) + set_rbio_raid6_extra_error(rbio, mirror_num); + + return rbio; +} + /* * We just scrub the parity that we have correct data on the same horizontal, * so we needn't allocate all pages for all the stripes. @@ -2771,6 +2796,12 @@ void raid56_parity_submit_scrub_rbio(struct btrfs_raid_bio *rbio) start_async_work(rbio, scrub_rbio_work_locked); } +void raid56_parity_submit_recover_rbio(struct btrfs_raid_bio *rbio) +{ + if (!lock_stripe_add(rbio)) + start_async_work(rbio, recover_rbio_work_locked); +} + /* * This is for scrub call sites where we already have correct stripe contents. * This allows us to avoid reading on-disk stripes again. diff --git a/fs/btrfs/raid56.h b/fs/btrfs/raid56.h index 41354a9f158a..6f146eb86832 100644 --- a/fs/btrfs/raid56.h +++ b/fs/btrfs/raid56.h @@ -197,7 +197,10 @@ struct btrfs_raid_bio *raid56_parity_alloc_scrub_rbio(struct bio *bio, struct btrfs_io_context *bioc, struct btrfs_device *scrub_dev, unsigned long *dbitmap); +struct btrfs_raid_bio *raid56_parity_alloc_recover_rbio(struct bio *bio, + struct btrfs_io_context *bioc, int mirror_num); void raid56_parity_submit_scrub_rbio(struct btrfs_raid_bio *rbio); +void raid56_parity_submit_recover_rbio(struct btrfs_raid_bio *rbio); void raid56_parity_cache_pages(struct btrfs_raid_bio *rbio, struct page **pages, int stripe_num); From patchwork Mon Jul 3 07:32:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13299673 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C858EB64DC for ; Mon, 3 Jul 2023 07:33:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231128AbjGCHd2 (ORCPT ); Mon, 3 Jul 2023 03:33:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230471AbjGCHdQ (ORCPT ); Mon, 3 Jul 2023 03:33:16 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C37DEE69 for ; Mon, 3 Jul 2023 00:33:04 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 61FB71F8B5 for ; Mon, 3 Jul 2023 07:33:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1688369583; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hC9OuQgd1HllIZBCrwKj7P2yFRVM1/HMtmhIvECImqo=; b=s9Z31II7MS/4gATGzwq9zaaH6EU3jtbwu5NyBLAOUPKsccbZiI336AJObnTAzIshtDScX2 q65c5Qumg0jM1sov80czEArowi5g2h41Udidcp57omyFf8AbU3eClHG51FCHcVqG8tCE5U RI/F6574r2E0Fk/WkY6WHHFgTDXi11o= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id AE3A713276 for ; Mon, 3 Jul 2023 07:33:02 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id WKL3HK55omRoVQAAMHmgww (envelope-from ) for ; Mon, 03 Jul 2023 07:33:02 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 05/14] btrfs: add the ability to read P/Q stripes directly Date: Mon, 3 Jul 2023 15:32:29 +0800 Message-ID: <0d54adfb390b3c9ffd8ff8f98db43de51df9c33e.1688368617.git.wqu@suse.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently there is no way to read P/Q stripes of a RAID56 full stripe directly. Normally caller should call btrfs_map_block() directly and fetch the physical location directly of the P/Q stripes. But for the recent scrub rework, it's no longer that elegant as the full scrub code is based on mirror_num based solution, thus this patch would add two special mirror_num for this usages: - Mirror -1 for P stripes - Mirror -2 for Q stripes Both mirrors only support read for now, and caller should make sure the range start is stripe aligned. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 2 +- fs/btrfs/volumes.c | 30 +++++++++++++++++++++++++++++- 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 1864856abb13..41c514db0793 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -1601,7 +1601,7 @@ static void scrub_submit_initial_read(struct scrub_ctx *sctx, int mirror = stripe->mirror_num; ASSERT(stripe->bg); - ASSERT(stripe->mirror_num > 0); + ASSERT(stripe->mirror_num); ASSERT(test_bit(SCRUB_STRIPE_FLAG_INITIALIZED, &stripe->state)); bbio = btrfs_bio_alloc(SCRUB_STRIPE_PAGES, REQ_OP_READ, fs_info, diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index cd632b3f579f..c22007bd830b 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6251,10 +6251,12 @@ static void set_io_stripe(struct btrfs_io_stripe *dst, const struct map_lookup * * * For RAID56 profile, mirror 1 means rebuild from P and * the remaining data stripes. + * And mirror -1 means read P stripes directly, -2 means + * read Q stripes directly. * * For RAID6 profile, mirror > 2 means mark another * data/P stripe error and rebuild from the remaining - * stripes.. + * stripes. * * @need_raid_map: (Used only for integrity checker) whether the map wants * a full stripe map (including all data and P/Q stripes) @@ -6297,6 +6299,12 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, goto out_free_em; } + /* Only allow mirror_num < 0 for RAID56. */ + if (mirror_num < 0 && !(map->type & BTRFS_BLOCK_GROUP_RAID56_MASK)) { + free_extent_map(em); + return -EINVAL; + } + data_stripes = nr_data_stripes(map); map_offset = logical - em->start; @@ -6382,6 +6390,26 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, logical; stripe_index = 0; stripe_offset = 0; + } else if (mirror_num < 0) { + /* Only allow READ for direct P/Q operations. */ + ASSERT(op == BTRFS_MAP_READ); + /* + * Caller should make sure the range start is stripe + * aligned. + */ + ASSERT(stripe_offset == 0); + + /* + * Stripes of @bioc is already sorted, stripes[0] is the + * first data stripe and stripes[@data_stripes] is the + * P stripe. + * So we only need to update the @stripe_index to the + * specified stripe, and set @stripe_nr/@stripe_offset + * to 0, so we can return the beginning of the P/Q stripe. + */ + stripe_offset = 0; + stripe_nr = 0; + stripe_index = data_stripes + (-mirror_num - 1); } else { /* * Mirror #0 or #1 means the original data block. From patchwork Mon Jul 3 07:32:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13299664 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2C61EB64DD for ; Mon, 3 Jul 2023 07:33:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230454AbjGCHdV (ORCPT ); Mon, 3 Jul 2023 03:33:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229914AbjGCHdR (ORCPT ); Mon, 3 Jul 2023 03:33:17 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F787E43 for ; Mon, 3 Jul 2023 00:33:06 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id B2F3C218FE; Mon, 3 Jul 2023 07:33:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1688369584; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IfxRWjgTRSgQST7ocZIjQ+Q9TJbbAel8rOeIPy8Pa/Y=; b=Kvuq0ZouLnwBSI/0onQpFZCoEDEDFr/yZlug7kdtViXd5BjRAmYiAqPtmqxyYTU3ery0cQ DzTPnw8K2pMCh52bYj5LC6nk6+TlWPo5tNbkZbjMwCglkZNqgktkgSUsDxtPdiKbSKAxOZ a4cwwVxPtsFt9/uPytupoYV/IevczZk= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id C7C7A13276; Mon, 3 Jul 2023 07:33:03 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id CE58I695omRoVQAAMHmgww (envelope-from ); Mon, 03 Jul 2023 07:33:03 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Johannes Thumshirn Subject: [PATCH 06/14] btrfs: scrub: make scrub_ctx::stripes dynamically allocated Date: Mon, 3 Jul 2023 15:32:30 +0800 Message-ID: X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently scrub_ctx::stripes are a fixed size array, this is fine for most use cases, but later we may want to allocate one larger sized array for logical bytenr based scrub. So here we change the member to a dynamically allocated array. This also affects the lifespan of the member. Now it only needs to be allocated and initialized at the beginning of scrub_stripe() function. Reviewed-by: Johannes Thumshirn Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 67 ++++++++++++++++++++++++++++++++++++------------ 1 file changed, 51 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 41c514db0793..1e49bb066619 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -172,7 +172,7 @@ struct scrub_stripe { }; struct scrub_ctx { - struct scrub_stripe stripes[SCRUB_STRIPES_PER_SCTX]; + struct scrub_stripe *stripes; struct scrub_stripe *raid56_data_stripes; struct btrfs_fs_info *fs_info; int first_free; @@ -181,6 +181,9 @@ struct scrub_ctx { int readonly; int sectors_per_bio; + /* Number of stripes we have in @stripes. */ + unsigned int nr_stripes; + /* State of IO submission throttling affecting the associated device */ ktime_t throttle_deadline; u64 throttle_sent; @@ -308,16 +311,24 @@ static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info) scrub_pause_off(fs_info); } +static void free_scrub_stripes(struct scrub_ctx *sctx) +{ + if (!sctx->stripes) + return; + + for (int i = 0; i < sctx->nr_stripes; i++) + release_scrub_stripe(&sctx->stripes[i]); + kfree(sctx->stripes); + sctx->nr_stripes = 0; + sctx->stripes = NULL; +} + static noinline_for_stack void scrub_free_ctx(struct scrub_ctx *sctx) { - int i; - if (!sctx) return; - for (i = 0; i < SCRUB_STRIPES_PER_SCTX; i++) - release_scrub_stripe(&sctx->stripes[i]); - + free_scrub_stripes(sctx); kfree(sctx); } @@ -331,7 +342,6 @@ static noinline_for_stack struct scrub_ctx *scrub_setup_ctx( struct btrfs_fs_info *fs_info, int is_dev_replace) { struct scrub_ctx *sctx; - int i; sctx = kzalloc(sizeof(*sctx), GFP_KERNEL); if (!sctx) @@ -339,14 +349,6 @@ static noinline_for_stack struct scrub_ctx *scrub_setup_ctx( refcount_set(&sctx->refs, 1); sctx->is_dev_replace = is_dev_replace; sctx->fs_info = fs_info; - for (i = 0; i < SCRUB_STRIPES_PER_SCTX; i++) { - int ret; - - ret = init_scrub_stripe(fs_info, &sctx->stripes[i]); - if (ret < 0) - goto nomem; - sctx->stripes[i].sctx = sctx; - } sctx->first_free = 0; atomic_set(&sctx->cancel_req, 0); @@ -1659,6 +1661,7 @@ static int flush_scrub_stripes(struct scrub_ctx *sctx) const int nr_stripes = sctx->cur_stripe; int ret = 0; + ASSERT(nr_stripes <= sctx->nr_stripes); if (!nr_stripes) return 0; @@ -1753,8 +1756,11 @@ static int queue_scrub_stripe(struct scrub_ctx *sctx, struct btrfs_block_group * struct scrub_stripe *stripe; int ret; + ASSERT(sctx->stripes); + ASSERT(sctx->nr_stripes); + /* No available slot, submit all stripes and wait for them. */ - if (sctx->cur_stripe >= SCRUB_STRIPES_PER_SCTX) { + if (sctx->cur_stripe >= sctx->nr_stripes) { ret = flush_scrub_stripes(sctx); if (ret < 0) return ret; @@ -2076,6 +2082,30 @@ static int scrub_simple_stripe(struct scrub_ctx *sctx, return ret; } +static int alloc_scrub_stripes(struct scrub_ctx *sctx, int nr_stripes) +{ + struct btrfs_fs_info *fs_info = sctx->fs_info; + int ret; + + ASSERT(!sctx->stripes); + ASSERT(!sctx->nr_stripes); + sctx->stripes = kcalloc(nr_stripes, sizeof(struct scrub_stripe), + GFP_KERNEL); + if (!sctx->stripes) + return -ENOMEM; + sctx->nr_stripes = nr_stripes; + for (int i = 0; i < sctx->nr_stripes; i++) { + ret = init_scrub_stripe(fs_info, &sctx->stripes[i]); + if (ret < 0) + goto cleanup; + sctx->stripes[i].sctx = sctx; + } + return 0; +cleanup: + free_scrub_stripes(sctx); + return -ENOMEM; +} + static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, struct btrfs_block_group *bg, struct extent_map *em, @@ -2102,6 +2132,10 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, scrub_blocked_if_needed(fs_info); + ret = alloc_scrub_stripes(sctx, SCRUB_STRIPES_PER_SCTX); + if (ret < 0) + return ret; + if (sctx->is_dev_replace && btrfs_dev_is_sequential(sctx->wr_tgtdev, physical)) { mutex_lock(&sctx->wr_lock); @@ -2224,6 +2258,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, kfree(sctx->raid56_data_stripes); sctx->raid56_data_stripes = NULL; } + free_scrub_stripes(sctx); if (sctx->is_dev_replace && ret >= 0) { int ret2; From patchwork Mon Jul 3 07:32:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13299669 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52D78C001DE for ; Mon, 3 Jul 2023 07:33:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230518AbjGCHdZ (ORCPT ); Mon, 3 Jul 2023 03:33:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35486 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230482AbjGCHdR (ORCPT ); Mon, 3 Jul 2023 03:33:17 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4AC47E50 for ; Mon, 3 Jul 2023 00:33:07 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 0300E1FD6E; Mon, 3 Jul 2023 07:33:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1688369586; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=k0D/8EJwkLNZiJNZP+bXDDGhBtPoHSUffDal9aANvN8=; b=h0lkuw835MzvIUXYllKrYdQ9Awtemkf2wVjOX0UYvPFrNtVgr/FT1yVxSW4wuORlbeBpiL lYXeW3wpOLC7r6VVKynoVmC5ojAM6dtVZlgsh2RaBgRQ3k8bS9W0HvWvysdxan2OgU7Knk WWf36+La9EKNoS6iLYe2/DF5mpt+UE4= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 20A4613276; Mon, 3 Jul 2023 07:33:04 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id sKEwN7B5omRoVQAAMHmgww (envelope-from ); Mon, 03 Jul 2023 07:33:04 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Cc: Johannes Thumshirn Subject: [PATCH 07/14] btrfs: scrub: introduce the skeleton for logical-scrub Date: Mon, 3 Jul 2023 15:32:31 +0800 Message-ID: X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently btrfs scrub is a per-device operation, this is fine for most non-RAID56 profiles, but not a good thing for RAID56 profiles. The main challenge for RAID56 is, we will read out data stripes more than one times (one for the data stripe scrub itself, more for parities). To address this, and maybe even improve the non-RAID56 scrub, here we introduce a new scrub flag, SCRUB_LOGICAL. This would be a per-fs operation, and conflicts with any dev-scrub/dev-replace. This patch only implements the basic exclusion checks. Reviewed-by: Johannes Thumshirn Signed-off-by: Qu Wenruo --- fs/btrfs/disk-io.c | 1 + fs/btrfs/fs.h | 12 +++++++ fs/btrfs/ioctl.c | 6 +++- fs/btrfs/scrub.c | 72 +++++++++++++++++++++++++++++++++++++- fs/btrfs/scrub.h | 2 ++ include/uapi/linux/btrfs.h | 11 ++++-- 6 files changed, 100 insertions(+), 4 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 7513388b0567..c11f28becaf2 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1850,6 +1850,7 @@ static void btrfs_init_scrub(struct btrfs_fs_info *fs_info) { mutex_init(&fs_info->scrub_lock); atomic_set(&fs_info->scrubs_running, 0); + atomic_set(&fs_info->scrubs_logical_running, 0); atomic_set(&fs_info->scrub_pause_req, 0); atomic_set(&fs_info->scrubs_paused, 0); atomic_set(&fs_info->scrub_cancel_req, 0); diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h index 203d2a267828..f3d50f97ab27 100644 --- a/fs/btrfs/fs.h +++ b/fs/btrfs/fs.h @@ -632,7 +632,19 @@ struct btrfs_fs_info { /* Private scrub information */ struct mutex scrub_lock; + + /* + * Number of running scrubbing, including both dev-scrub (at most + * one dev-scrub on each device) and logical-scrub (at most + * one logical-scrub for each fs). + */ atomic_t scrubs_running; + + /* + * Number of running logical scrubbing, there is at most one running + * logical scrub for each fs. + */ + atomic_t scrubs_logical_running; atomic_t scrub_pause_req; atomic_t scrubs_paused; atomic_t scrub_cancel_req; diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index edbbd5cf23fc..1a1afcce73e0 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3179,7 +3179,11 @@ static long btrfs_ioctl_scrub(struct file *file, void __user *arg) goto out; } - ret = btrfs_scrub_dev(fs_info, sa->devid, sa->start, sa->end, + if (sa->flags & BTRFS_SCRUB_LOGICAL) + ret = btrfs_scrub_logical(fs_info, sa->start, sa->end, + &sa->progress, sa->flags & BTRFS_SCRUB_READONLY); + else + ret = btrfs_scrub_dev(fs_info, sa->devid, sa->start, sa->end, &sa->progress, sa->flags & BTRFS_SCRUB_READONLY, 0); diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 1e49bb066619..ff637f83aa0e 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -178,7 +178,8 @@ struct scrub_ctx { int first_free; int cur_stripe; atomic_t cancel_req; - int readonly; + bool readonly; + bool scrub_logical; int sectors_per_bio; /* Number of stripes we have in @stripes. */ @@ -2835,6 +2836,14 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, goto out; } + /* Dev-scrub conflicts with logical-scrub. */ + if (atomic_read(&fs_info->scrubs_logical_running)) { + mutex_unlock(&fs_info->scrub_lock); + mutex_unlock(&fs_info->fs_devices->device_list_mutex); + ret = -EINPROGRESS; + goto out; + } + down_read(&fs_info->dev_replace.rwsem); if (dev->scrub_ctx || (!is_dev_replace && @@ -2945,6 +2954,67 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, return ret; } +int btrfs_scrub_logical(struct btrfs_fs_info *fs_info, u64 start, u64 end, + struct btrfs_scrub_progress *progress, bool readonly) +{ + struct scrub_ctx *sctx; + int ret; + + if (btrfs_fs_closing(fs_info)) + return -EAGAIN; + + /* At mount time we have ensured nodesize is in the range of [4K, 64K]. */ + ASSERT(fs_info->nodesize <= BTRFS_STRIPE_LEN); + + sctx = scrub_setup_ctx(fs_info, false); + if (IS_ERR(sctx)) + return PTR_ERR(sctx); + sctx->scrub_logical = true; + sctx->readonly = readonly; + + ret = scrub_workers_get(fs_info, false); + if (ret) + goto out_free_ctx; + + /* Make sure we're the only running scrub. */ + mutex_lock(&fs_info->scrub_lock); + if (atomic_read(&fs_info->scrubs_running)) { + mutex_unlock(&fs_info->scrub_lock); + ret = -EINPROGRESS; + goto out; + } + down_read(&fs_info->dev_replace.rwsem); + if (btrfs_dev_replace_is_ongoing(&fs_info->dev_replace)) { + up_read(&fs_info->dev_replace.rwsem); + mutex_unlock(&fs_info->scrub_lock); + ret = -EINPROGRESS; + goto out; + } + up_read(&fs_info->dev_replace.rwsem); + /* + * Checking @scrub_pause_req here, we can avoid + * race between committing transaction and scrubbing. + */ + __scrub_blocked_if_needed(fs_info); + atomic_inc(&fs_info->scrubs_running); + atomic_inc(&fs_info->scrubs_logical_running); + mutex_unlock(&fs_info->scrub_lock); + + /* The main work would be implemented. */ + ret = -EOPNOTSUPP; + + atomic_dec(&fs_info->scrubs_running); + atomic_dec(&fs_info->scrubs_logical_running); + wake_up(&fs_info->scrub_pause_wait); + if (progress) + memcpy(progress, &sctx->stat, sizeof(*progress)); +out: + scrub_workers_put(fs_info); +out_free_ctx: + scrub_free_ctx(sctx); + return ret; +} + void btrfs_scrub_pause(struct btrfs_fs_info *fs_info) { mutex_lock(&fs_info->scrub_lock); diff --git a/fs/btrfs/scrub.h b/fs/btrfs/scrub.h index 7639103ebf9d..22db0f71083e 100644 --- a/fs/btrfs/scrub.h +++ b/fs/btrfs/scrub.h @@ -6,6 +6,8 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, u64 end, struct btrfs_scrub_progress *progress, int readonly, int is_dev_replace); +int btrfs_scrub_logical(struct btrfs_fs_info *fs_info, u64 start, u64 end, + struct btrfs_scrub_progress *progress, bool readonly); void btrfs_scrub_pause(struct btrfs_fs_info *fs_info); void btrfs_scrub_continue(struct btrfs_fs_info *fs_info); int btrfs_scrub_cancel(struct btrfs_fs_info *info); diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h index dbb8b96da50d..fa5cb3b3611b 100644 --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -186,8 +186,15 @@ struct btrfs_scrub_progress { * Intermittent error. */ }; -#define BTRFS_SCRUB_READONLY 1 -#define BTRFS_SCRUB_SUPPORTED_FLAGS (BTRFS_SCRUB_READONLY) +#define BTRFS_SCRUB_READONLY (1ULL << 0) + +/* + * Regular scrub is based on device, while with this flag, scrub is + * based on logical, and @devid would be ignored. + */ +#define BTRFS_SCRUB_LOGICAL (1ULL << 1) +#define BTRFS_SCRUB_SUPPORTED_FLAGS (BTRFS_SCRUB_READONLY |\ + BTRFS_SCRUB_LOGICAL) struct btrfs_ioctl_scrub_args { __u64 devid; /* in */ __u64 start; /* in */ From patchwork Mon Jul 3 07:32:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13299670 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3873FEB64DD for ; Mon, 3 Jul 2023 07:33:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230514AbjGCHd0 (ORCPT ); Mon, 3 Jul 2023 03:33:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230493AbjGCHdR (ORCPT ); Mon, 3 Jul 2023 03:33:17 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 64561E4F for ; Mon, 3 Jul 2023 00:33:08 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 1F35C218FF for ; Mon, 3 Jul 2023 07:33:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1688369587; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=V7UcBoD8ykDpd7MsZoEk2hGzeznd/vqK1/A/5c0ZlR4=; b=AbgMoU6+l3v/z2GgFdhhjo9d1R9uPZrupkYgcaTGiXLd7l9HQGNEI9ljOEYFiamQtACl0g zTl+3XxzRrVBhpnf4QLTJKQwjEjd9y8YvB+PIqkpp294S1LJjI7EvBGbzEyvYvUaHH5nsf ka/s+2Zaq6agmBRyFs54o2NzHMR1QPM= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 6B4FA13276 for ; Mon, 3 Jul 2023 07:33:06 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id eK36DLJ5omRoVQAAMHmgww (envelope-from ) for ; Mon, 03 Jul 2023 07:33:06 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 08/14] btrfs: scrub: extract the common preparation before scrubbing a block group Date: Mon, 3 Jul 2023 15:32:32 +0800 Message-ID: X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Introduce a new helper, prepare_scrub_block_group(), to handle the checks before scrubbing a block group. For now the helper is only called by scrub_enumerate_chunks(), but would be reused by later scrub_logical feature. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 264 ++++++++++++++++++++++++++--------------------- 1 file changed, 147 insertions(+), 117 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index ff637f83aa0e..806c4683a7ef 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2343,6 +2343,143 @@ static int finish_extent_writes_for_zoned(struct btrfs_root *root, return btrfs_commit_transaction(trans); } +/* + * Do the preparation before we scrub the block group. + * + * Return >0 if we don't need to scrub the block group. + * Return 0 if we have done the preparation and set @ro_set_ret. + * Return <0 for error and can not scrub the bg. + */ +static int prepare_scrub_block_group(struct scrub_ctx *sctx, + struct btrfs_block_group *bg, + bool *ro_set_ret) +{ + struct btrfs_fs_info *fs_info = sctx->fs_info; + struct btrfs_root *root = fs_info->dev_root; + bool ro_set = false; + int ret; + + if (sctx->is_dev_replace && btrfs_is_zoned(fs_info)) { + if (!test_bit(BLOCK_GROUP_FLAG_TO_COPY, &bg->runtime_flags)) + return 1; + } + + /* + * Make sure that while we are scrubbing the corresponding block + * group doesn't get its logical address and its device extents + * reused for another block group, which can possibly be of a + * different type and different profile. We do this to prevent + * false error detections and crashes due to bogus attempts to + * repair extents. + */ + spin_lock(&bg->lock); + if (test_bit(BLOCK_GROUP_FLAG_REMOVED, &bg->runtime_flags)) { + spin_unlock(&bg->lock); + return 1; + } + btrfs_freeze_block_group(bg); + spin_unlock(&bg->lock); + + /* + * we need call btrfs_inc_block_group_ro() with scrubs_paused, + * to avoid deadlock caused by: + * btrfs_inc_block_group_ro() + * -> btrfs_wait_for_commit() + * -> btrfs_commit_transaction() + * -> btrfs_scrub_pause() + */ + scrub_pause_on(fs_info); + + /* + * Don't do chunk preallocation for scrub. + * + * This is especially important for SYSTEM bgs, or we can hit + * -EFBIG from btrfs_finish_chunk_alloc() like: + * 1. The only SYSTEM bg is marked RO. + * Since SYSTEM bg is small, that's pretty common. + * 2. New SYSTEM bg will be allocated + * Due to regular version will allocate new chunk. + * 3. New SYSTEM bg is empty and will get cleaned up + * Before cleanup really happens, it's marked RO again. + * 4. Empty SYSTEM bg get scrubbed + * We go back to 2. + * + * This can easily boost the amount of SYSTEM chunks if cleaner + * thread can't be triggered fast enough, and use up all space + * of btrfs_super_block::sys_chunk_array + * + * While for dev replace, we need to try our best to mark block + * group RO, to prevent race between: + * - Write duplication + * Contains latest data + * - Scrub copy + * Contains data from commit tree + * + * If target block group is not marked RO, nocow writes can + * be overwritten by scrub copy, causing data corruption. + * So for dev-replace, it's not allowed to continue if a block + * group is not RO. + */ + ret = btrfs_inc_block_group_ro(bg, sctx->is_dev_replace); + if (!ret && sctx->is_dev_replace) { + ret = finish_extent_writes_for_zoned(root, bg); + if (ret) { + btrfs_dec_block_group_ro(bg); + scrub_pause_off(fs_info); + return ret; + } + } + + if (ret == 0) + ro_set = 1; + if (ret == -ENOSPC && !sctx->is_dev_replace && + !(bg->flags & BTRFS_BLOCK_GROUP_RAID56_MASK)) + /* + * btrfs_inc_block_group_ro return -ENOSPC when it + * failed in creating new chunk for metadata. + * It is not a problem for scrub, because + * metadata are always cowed, and our scrub paused + * commit_transactions. + * + * For RAID56 chunks, we have to mark them read-only + * for scrub, as later we would use our own cache + * out of RAID56 realm. + * Thus we want the RAID56 bg to be marked RO to + * prevent RMW from screwing up out cache. + */ + ro_set = 0; + + if (ret == -ETXTBSY) { + btrfs_warn(fs_info, + "skipping scrub of block group %llu due to active swapfile", + bg->start); + btrfs_unfreeze_block_group(bg); + scrub_pause_off(fs_info); + return ret; + } + if (ret < 0) { + btrfs_warn(fs_info, + "failed setting block group ro: %d", ret); + btrfs_unfreeze_block_group(bg); + scrub_pause_off(fs_info); + return ret; + } + + /* + * Now the target block is marked RO, wait for nocow writes to + * finish before dev-replace. + * COW is fine, as COW never overwrites extents in commit tree. + */ + if (sctx->is_dev_replace) { + btrfs_wait_nocow_writers(bg); + btrfs_wait_ordered_roots(fs_info, U64_MAX, bg->start, + bg->length); + } + scrub_pause_off(fs_info); + *ro_set_ret = ro_set; + return 0; +} + static noinline_for_stack int scrub_enumerate_chunks(struct scrub_ctx *sctx, struct btrfs_device *scrub_dev, u64 start, u64 end) @@ -2353,7 +2490,6 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, struct btrfs_root *root = fs_info->dev_root; u64 chunk_offset; int ret = 0; - int ro_set; int slot; struct extent_buffer *l; struct btrfs_key key; @@ -2374,6 +2510,7 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, key.type = BTRFS_DEV_EXTENT_KEY; while (1) { + bool ro_set = false; u64 dev_extent_len; ret = btrfs_search_slot(NULL, root, &key, path, 0, 0); @@ -2455,127 +2592,20 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, goto skip; } - if (sctx->is_dev_replace && btrfs_is_zoned(fs_info)) { - if (!test_bit(BLOCK_GROUP_FLAG_TO_COPY, &cache->runtime_flags)) { - btrfs_put_block_group(cache); - goto skip; - } + ret = prepare_scrub_block_group(sctx, cache, &ro_set); + if (ret == -ETXTBSY) { + ret = 0; + goto skip_unfreeze; } - - /* - * Make sure that while we are scrubbing the corresponding block - * group doesn't get its logical address and its device extents - * reused for another block group, which can possibly be of a - * different type and different profile. We do this to prevent - * false error detections and crashes due to bogus attempts to - * repair extents. - */ - spin_lock(&cache->lock); - if (test_bit(BLOCK_GROUP_FLAG_REMOVED, &cache->runtime_flags)) { - spin_unlock(&cache->lock); + if (ret < 0) { + btrfs_put_block_group(cache); + break; + } + if (ret > 0) { btrfs_put_block_group(cache); goto skip; } - btrfs_freeze_block_group(cache); - spin_unlock(&cache->lock); - /* - * we need call btrfs_inc_block_group_ro() with scrubs_paused, - * to avoid deadlock caused by: - * btrfs_inc_block_group_ro() - * -> btrfs_wait_for_commit() - * -> btrfs_commit_transaction() - * -> btrfs_scrub_pause() - */ - scrub_pause_on(fs_info); - - /* - * Don't do chunk preallocation for scrub. - * - * This is especially important for SYSTEM bgs, or we can hit - * -EFBIG from btrfs_finish_chunk_alloc() like: - * 1. The only SYSTEM bg is marked RO. - * Since SYSTEM bg is small, that's pretty common. - * 2. New SYSTEM bg will be allocated - * Due to regular version will allocate new chunk. - * 3. New SYSTEM bg is empty and will get cleaned up - * Before cleanup really happens, it's marked RO again. - * 4. Empty SYSTEM bg get scrubbed - * We go back to 2. - * - * This can easily boost the amount of SYSTEM chunks if cleaner - * thread can't be triggered fast enough, and use up all space - * of btrfs_super_block::sys_chunk_array - * - * While for dev replace, we need to try our best to mark block - * group RO, to prevent race between: - * - Write duplication - * Contains latest data - * - Scrub copy - * Contains data from commit tree - * - * If target block group is not marked RO, nocow writes can - * be overwritten by scrub copy, causing data corruption. - * So for dev-replace, it's not allowed to continue if a block - * group is not RO. - */ - ret = btrfs_inc_block_group_ro(cache, sctx->is_dev_replace); - if (!ret && sctx->is_dev_replace) { - ret = finish_extent_writes_for_zoned(root, cache); - if (ret) { - btrfs_dec_block_group_ro(cache); - scrub_pause_off(fs_info); - btrfs_put_block_group(cache); - break; - } - } - - if (ret == 0) { - ro_set = 1; - } else if (ret == -ENOSPC && !sctx->is_dev_replace && - !(cache->flags & BTRFS_BLOCK_GROUP_RAID56_MASK)) { - /* - * btrfs_inc_block_group_ro return -ENOSPC when it - * failed in creating new chunk for metadata. - * It is not a problem for scrub, because - * metadata are always cowed, and our scrub paused - * commit_transactions. - * - * For RAID56 chunks, we have to mark them read-only - * for scrub, as later we would use our own cache - * out of RAID56 realm. - * Thus we want the RAID56 bg to be marked RO to - * prevent RMW from screwing up out cache. - */ - ro_set = 0; - } else if (ret == -ETXTBSY) { - btrfs_warn(fs_info, - "skipping scrub of block group %llu due to active swapfile", - cache->start); - scrub_pause_off(fs_info); - ret = 0; - goto skip_unfreeze; - } else { - btrfs_warn(fs_info, - "failed setting block group ro: %d", ret); - btrfs_unfreeze_block_group(cache); - btrfs_put_block_group(cache); - scrub_pause_off(fs_info); - break; - } - - /* - * Now the target block is marked RO, wait for nocow writes to - * finish before dev-replace. - * COW is fine, as COW never overwrites extents in commit tree. - */ - if (sctx->is_dev_replace) { - btrfs_wait_nocow_writers(cache); - btrfs_wait_ordered_roots(fs_info, U64_MAX, cache->start, - cache->length); - } - - scrub_pause_off(fs_info); down_write(&dev_replace->rwsem); dev_replace->cursor_right = found_key.offset + dev_extent_len; dev_replace->cursor_left = found_key.offset; From patchwork Mon Jul 3 07:32:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13299671 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF531C001B3 for ; Mon, 3 Jul 2023 07:33:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230520AbjGCHd0 (ORCPT ); Mon, 3 Jul 2023 03:33:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35470 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230484AbjGCHdR (ORCPT ); Mon, 3 Jul 2023 03:33:17 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 746D9E6B for ; Mon, 3 Jul 2023 00:33:09 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 31D761FD6F for ; Mon, 3 Jul 2023 07:33:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1688369588; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fH/sY+7fwJTU1TkUgnFUNJhYmdzJqljiqkh828iLYmg=; b=qklfIGn1LwFMuq9Yi7Ahpu3dTtzJRUqiUUlBqXNFgrrgpfpJUOjSYZ7QgylSXIPW/u3fZI lTzdKDWAUUWxpXYCIBDYM2VsitKJp9SLaM9DxaqZQC2ZcW4NV6PvJvC3XH8vjohAe2l6D5 ePUn6Eq7TrfmtQRJR17wv0cpvnn7tik= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 813F113276 for ; Mon, 3 Jul 2023 07:33:07 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id sFvwErN5omRoVQAAMHmgww (envelope-from ) for ; Mon, 03 Jul 2023 07:33:07 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 09/14] btrfs: scrub: implement the chunk iteration code for scrub_logical Date: Mon, 3 Jul 2023 15:32:33 +0800 Message-ID: X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This patch implements the code to iterate chunks for scrub_logical feature. The differences between scrub_logical and scrub_dev are: - No need to lookup dev-extent tree As scrub_logical can directly grab the block group and check against the logical range. Thus we don't need to do the dev-extent lookup. - No dev-replace related work Scrub_logical would only be a feature to enhance full fs scrub, thus no need to do dev-replace related checks. For now only the block group iteration code is implemented, the real scrub part is not yet implemented. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 64 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 806c4683a7ef..79e47eee9d1b 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2984,6 +2984,69 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, return ret; } +static int scrub_logical_chunks(struct scrub_ctx *sctx, u64 start, u64 end) +{ + struct btrfs_fs_info *fs_info = sctx->fs_info; + u64 cur = start; + int ret = 0; + + while (cur < end) { + struct btrfs_block_group *bg; + bool ro_set = false; + + bg = btrfs_lookup_first_block_group(fs_info, cur); + + /* No more block groups. */ + if (!bg) + break; + if (bg->start > end) + break; + + ret = prepare_scrub_block_group(sctx, bg, &ro_set); + if (ret == -ETXTBSY) { + ret = 0; + goto next; + } + if (ret > 0) + goto next; + if (ret < 0) { + btrfs_put_block_group(bg); + break; + } + + /* The real work starts here. */ + ret = -EOPNOTSUPP; + + if (ro_set) + btrfs_dec_block_group_ro(bg); + /* + * We might have prevented the cleaner kthread from deleting + * this block group if it was already unused because we raced + * and set it to RO mode first. So add it back to the unused + * list, otherwise it might not ever be deleted unless a manual + * balance is triggered or it becomes used and unused again. + */ + spin_lock(&bg->lock); + if (!test_bit(BLOCK_GROUP_FLAG_REMOVED, &bg->runtime_flags) && + !bg->ro && bg->reserved == 0 && bg->used == 0) { + spin_unlock(&bg->lock); + if (btrfs_test_opt(fs_info, DISCARD_ASYNC)) + btrfs_discard_queue_work(&fs_info->discard_ctl, bg); + else + btrfs_mark_bg_unused(bg); + } else { + spin_unlock(&bg->lock); + } + btrfs_unfreeze_block_group(bg); +next: + cur = bg->start + bg->length; + btrfs_put_block_group(bg); + if (ret < 0) + break; + } + return ret; +} + int btrfs_scrub_logical(struct btrfs_fs_info *fs_info, u64 start, u64 end, struct btrfs_scrub_progress *progress, bool readonly) { @@ -3030,8 +3093,7 @@ int btrfs_scrub_logical(struct btrfs_fs_info *fs_info, u64 start, u64 end, atomic_inc(&fs_info->scrubs_logical_running); mutex_unlock(&fs_info->scrub_lock); - /* The main work would be implemented. */ - ret = -EOPNOTSUPP; + ret = scrub_logical_chunks(sctx, start, end); atomic_dec(&fs_info->scrubs_running); atomic_dec(&fs_info->scrubs_logical_running); From patchwork Mon Jul 3 07:32:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13299668 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0392EB64DC for ; Mon, 3 Jul 2023 07:33:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230510AbjGCHdY (ORCPT ); Mon, 3 Jul 2023 03:33:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230518AbjGCHdT (ORCPT ); Mon, 3 Jul 2023 03:33:19 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 54517E56 for ; Mon, 3 Jul 2023 00:33:15 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 431C71FD70 for ; Mon, 3 Jul 2023 07:33:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1688369589; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HcfMNJ9wAUf7VMWNQAA6utV6c4KxEGpXKtlBN3PD6i0=; b=uRsLA0MZCDh1qluOd8U44elvc7dWK17lrr+ubim+SUa8R2u7EcoVk5pLGhs/xWwLWKoo8/ JYpoMXRmVG6H4aRKenWQGe+xdFkgG4iD9LpfG4b6SXqazRgCIm+5JO9RpRcV+VCDLpTrB8 7q5XMqM0cZtZVXbU22X7h0dQT9zztiE= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 982C113276 for ; Mon, 3 Jul 2023 07:33:08 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id EFx2GLR5omRoVQAAMHmgww (envelope-from ) for ; Mon, 03 Jul 2023 07:33:08 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 10/14] btrfs: scrub: implement the basic extent iteration code Date: Mon, 3 Jul 2023 15:32:34 +0800 Message-ID: <6834a68474c0b42fde9ee100133f93eda12066b7.1688368617.git.wqu@suse.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Unlike the per-device scrub code, logical bytenr based scrub code is much easier iterating the extents. The difference is mainly in the stripe queueing part, as we need to queue @nr_copies stripes at once. So here we introduce a helper, scrub_copy_stripe(), and fill the first stripe, then use that helper to fill the remaining copies. For now, we still use the same flush_scrub_stripes(), but the repair part is less efficient. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 163 ++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 160 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 79e47eee9d1b..02b212fe4712 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -1471,7 +1471,7 @@ static void scrub_stripe_reset_bitmaps(struct scrub_stripe *stripe) static int scrub_find_fill_first_stripe(struct btrfs_block_group *bg, struct btrfs_device *dev, u64 physical, int mirror_num, u64 logical_start, - u32 logical_len, + u64 logical_len, struct scrub_stripe *stripe) { struct btrfs_fs_info *fs_info = bg->fs_info; @@ -2984,6 +2984,164 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start, return ret; } +static void scrub_copy_stripe(const struct scrub_stripe *src, + struct scrub_stripe *dst) +{ + struct scrub_ctx *sctx = src->sctx; + struct btrfs_fs_info *fs_info = sctx->fs_info; + + /* This function should only be called for logical scrub. */ + ASSERT(sctx->scrub_logical); + scrub_reset_stripe(dst); + + dst->bg = src->bg; + dst->dev = NULL; + dst->physical = 0; + dst->logical = src->logical; + dst->mirror_num = src->mirror_num; + + bitmap_copy(&dst->extent_sector_bitmap, &src->extent_sector_bitmap, + src->nr_sectors); + memcpy(dst->csums, src->csums, + (BTRFS_STRIPE_LEN >> fs_info->sectorsize_bits) * + fs_info->csum_size); + + for (int i = 0; i < src->nr_sectors; i++) { + dst->sectors[i].is_metadata = src->sectors[i].is_metadata; + if (src->sectors[i].is_metadata) + dst->sectors[i].generation = src->sectors[i].generation; + else if (src->sectors[i].csum) + dst->sectors[i].csum = dst->csums + i * fs_info->csum_size; + } + dst->nr_data_extents = src->nr_data_extents; + dst->nr_meta_extents = src->nr_meta_extents; + set_bit(SCRUB_STRIPE_FLAG_INITIALIZED, &dst->state); +} + +/* + * Unlike queue_scrub_stripe() which only queue one stripe, this queue all + * mirrors for non-RAID56 profiles, or the full stripe for RAID56. + */ +static int queue_scrub_logical_stripes(struct scrub_ctx *sctx, + struct btrfs_block_group *bg, u64 logical) +{ + const int raid_index = btrfs_bg_flags_to_raid_index(bg->flags); + const int nr_copies = btrfs_raid_array[raid_index].ncopies; + const int old_cur = sctx->cur_stripe; + struct scrub_stripe *stripe; + int ret; + + ASSERT(sctx->stripes); + ASSERT(sctx->nr_stripes); + + if (sctx->cur_stripe + nr_copies > sctx->nr_stripes) { + ret = flush_scrub_stripes(sctx); + if (ret < 0) + return ret; + } + + stripe = &sctx->stripes[old_cur]; + + scrub_reset_stripe(stripe); + ret = scrub_find_fill_first_stripe(bg, NULL, 0, 1, + logical, bg->start + bg->length - logical, stripe); + /* Either >0 as no more extents or <0 for error. */ + if (ret) + return ret; + + /* For the remaining slots, just copy the above mirror. */ + for (int i = 1; i < nr_copies; i++) { + struct scrub_stripe *dst = &sctx->stripes[old_cur + i]; + + scrub_copy_stripe(stripe, dst); + dst->mirror_num = i + 1; + } + sctx->cur_stripe += nr_copies; + return 0; +} + +static int scrub_logical_one_chunk(struct scrub_ctx *sctx, + struct btrfs_block_group *bg) +{ + struct btrfs_fs_info *fs_info = sctx->fs_info; + struct extent_map_tree *map_tree = &fs_info->mapping_tree; + struct extent_map *em; + u64 cur = bg->start; + const int raid_index = btrfs_bg_flags_to_raid_index(bg->flags); + const int nr_copies = btrfs_raid_array[raid_index].ncopies; + int nr_stripes; + int ret = 0; + int flush_ret; + + read_lock(&map_tree->lock); + em = lookup_extent_mapping(map_tree, bg->start, bg->length); + read_unlock(&map_tree->lock); + if (!em) { + /* + * Might have been an unused block group deleted by the cleaner + * kthread or relocation. + */ + spin_lock(&bg->lock); + if (!test_bit(BLOCK_GROUP_FLAG_REMOVED, &bg->runtime_flags)) + ret = -EINVAL; + spin_unlock(&bg->lock); + return ret; + } + + scrub_blocked_if_needed(fs_info); + + /* RAID56 not yet supported. */ + if (bg->flags & BTRFS_BLOCK_GROUP_RAID56_MASK) { + ret = -EOPNOTSUPP; + goto out; + } + + nr_stripes = SCRUB_STRIPES_PER_SCTX * nr_copies; + ret = alloc_scrub_stripes(sctx, nr_stripes); + if (ret < 0) + goto out; + + while (cur < bg->start + bg->length) { + /* Canceled? */ + if (atomic_read(&fs_info->scrub_cancel_req) || + atomic_read(&sctx->cancel_req)) { + ret = -ECANCELED; + break; + } + /* Paused? */ + if (atomic_read(&fs_info->scrub_pause_req)) { + /* Push queued extents */ + scrub_blocked_if_needed(fs_info); + } + /* Block group removed? */ + spin_lock(&bg->lock); + if (test_bit(BLOCK_GROUP_FLAG_REMOVED, &bg->runtime_flags)) { + spin_unlock(&bg->lock); + ret = 0; + break; + } + spin_unlock(&bg->lock); + + ret = queue_scrub_logical_stripes(sctx, bg, cur); + if (ret > 0) { + ret = 0; + break; + } + if (ret < 0) + break; + ASSERT(sctx->cur_stripe >= 1); + cur = sctx->stripes[sctx->cur_stripe - 1].logical + BTRFS_STRIPE_LEN; + } +out: + flush_ret = flush_scrub_stripes(sctx); + if (!ret) + ret = flush_ret; + free_scrub_stripes(sctx); + free_extent_map(em); + return ret; + +} + static int scrub_logical_chunks(struct scrub_ctx *sctx, u64 start, u64 end) { struct btrfs_fs_info *fs_info = sctx->fs_info; @@ -3014,8 +3172,7 @@ static int scrub_logical_chunks(struct scrub_ctx *sctx, u64 start, u64 end) break; } - /* The real work starts here. */ - ret = -EOPNOTSUPP; + ret = scrub_logical_one_chunk(sctx, bg); if (ro_set) btrfs_dec_block_group_ro(bg); From patchwork Mon Jul 3 07:32:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13299665 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8E43C001B3 for ; Mon, 3 Jul 2023 07:33:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230306AbjGCHdW (ORCPT ); Mon, 3 Jul 2023 03:33:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35544 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230499AbjGCHdS (ORCPT ); Mon, 3 Jul 2023 03:33:18 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A3ACE51 for ; Mon, 3 Jul 2023 00:33:12 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 4CEBE21900 for ; Mon, 3 Jul 2023 07:33:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1688369590; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DFH1qbuWAjyc8HOMyITGcpkkMhDRlnZ+Ia5Lco34zkc=; b=lgPZ5DGU0MAy0mT0qkrf0A87YdIxxn0Hv75AwznQjvo54L7CT9clXnqqxTo0lv5g/PIa2d 1zR57G/HH+SheJvIRwxm2hezjiOn9jQPElabDj4aAk+nDu0ieWKq5fUd73t7l+sQmCWqg/ cTYTIZvs9QzPrYE4Dkw3JRHIkTkj52M= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A22E113276 for ; Mon, 3 Jul 2023 07:33:09 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 0C6QG7V5omRoVQAAMHmgww (envelope-from ) for ; Mon, 03 Jul 2023 07:33:09 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 11/14] btrfs: scrub: implement the repair part of scrub logical Date: Mon, 3 Jul 2023 15:32:35 +0800 Message-ID: X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The repair part of scrub logical is done differently compared to the per-device counterpart: - We read out all mirrors in one go Since it's no longer per-device, we just read out all mirrors. - Find a good mirror for the same sectornr of all mirrors - Copy the good content to any corrupted sector This has several advantages: - Less IO wait Since all IOs are submitted at the very beginning, we avoid the read then wait for per-device scrub. This applies to both read and write part. This needs some changes to the per-device scrub code though: - Call the scrub_verify_one_stripe() inside scrub_read_endio() This is to improve the performance, as we can have csum verification per-mirror. - Do not queue scrub_stripe_read_repair_worker() workload for scrub_logical Since we don't need to go per-device repair path. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 148 +++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 138 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 02b212fe4712..21b3fa55d0f0 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -995,7 +995,6 @@ static void scrub_stripe_read_repair_worker(struct work_struct *work) ASSERT(stripe->mirror_num > 0); wait_scrub_stripe_io(stripe); - scrub_verify_one_stripe(stripe, stripe->extent_sector_bitmap); /* Save the initial failed bitmap for later repair and report usage. */ stripe->init_error_bitmap = stripe->error_bitmap; stripe->init_nr_io_errors = bitmap_weight(&stripe->io_error_bitmap, @@ -1068,9 +1067,12 @@ static void scrub_read_endio(struct btrfs_bio *bbio) } bio_put(&bbio->bio); if (atomic_dec_and_test(&stripe->pending_io)) { + scrub_verify_one_stripe(stripe, stripe->extent_sector_bitmap); wake_up(&stripe->io_wait); - INIT_WORK(&stripe->work, scrub_stripe_read_repair_worker); - queue_work(stripe->bg->fs_info->scrub_workers, &stripe->work); + if (!stripe->sctx->scrub_logical) { + INIT_WORK(&stripe->work, scrub_stripe_read_repair_worker); + queue_work(stripe->bg->fs_info->scrub_workers, &stripe->work); + } } } @@ -1655,7 +1657,131 @@ static bool stripe_has_metadata_error(struct scrub_stripe *stripe) return false; } -static int flush_scrub_stripes(struct scrub_ctx *sctx) +/* + * Unlike the per-device repair, we have all mirrors read out already. + * + * Thus we only need to find out the good mirror, and copy the content to + * any bad sectors. + */ +static void repair_one_mirror_group(struct scrub_ctx *sctx, int start_stripe, + int ncopies) +{ + struct btrfs_fs_info *fs_info = sctx->fs_info; + struct scrub_stripe *first_stripe = &sctx->stripes[start_stripe]; + struct scrub_stripe *cur_stripe; + const u32 sectorsize = fs_info->sectorsize; + int sectornr; + + ASSERT(start_stripe + ncopies <= sctx->cur_stripe); + + for_each_set_bit(sectornr, &first_stripe->extent_sector_bitmap, + first_stripe->nr_sectors) { + struct scrub_stripe *good_stripe; + int good_mirror = -1; + + for (int cur_mirror = start_stripe; + cur_mirror < start_stripe + ncopies; cur_mirror++) { + cur_stripe = &sctx->stripes[cur_mirror]; + + if (!test_bit(sectornr, &cur_stripe->error_bitmap)) { + good_mirror = cur_mirror; + break; + } + } + /* No good mirror found, this vertical stripe can not be repaired. */ + if (good_mirror < 0) + continue; + + good_stripe = &sctx->stripes[good_mirror]; + + for (int cur_mirror = start_stripe; + cur_mirror < start_stripe + ncopies; cur_mirror++) { + cur_stripe = &sctx->stripes[cur_mirror]; + + if (!test_bit(sectornr, &cur_stripe->error_bitmap)) + continue; + /* Repair from the good mirror. */ + memcpy_page(scrub_stripe_get_page(cur_stripe, sectornr), + scrub_stripe_get_page_offset(cur_stripe, sectornr), + scrub_stripe_get_page(good_stripe, sectornr), + scrub_stripe_get_page_offset(good_stripe, sectornr), + sectorsize); + clear_bit(sectornr, &cur_stripe->error_bitmap); + clear_bit(sectornr, &cur_stripe->io_error_bitmap); + if (cur_stripe->sectors[sectornr].is_metadata) + clear_bit(sectornr, &cur_stripe->meta_error_bitmap); + else + clear_bit(sectornr, &cur_stripe->csum_error_bitmap); + } + } + for (int cur_mirror = start_stripe; cur_mirror < start_stripe + ncopies; + cur_mirror++) { + cur_stripe = &sctx->stripes[cur_mirror]; + set_bit(SCRUB_STRIPE_FLAG_REPAIR_DONE, &cur_stripe->state); + scrub_stripe_report_errors(sctx, cur_stripe); + wake_up(&cur_stripe->repair_wait); + + if (btrfs_is_zoned(fs_info)) { + if (!bitmap_empty(&cur_stripe->init_error_bitmap, + cur_stripe->nr_sectors)) { + btrfs_repair_one_zone(fs_info, cur_stripe->logical); + break; + } + } + if (!sctx->readonly) { + unsigned long repaired; + + bitmap_andnot(&repaired, &cur_stripe->init_error_bitmap, + &cur_stripe->error_bitmap, + cur_stripe->nr_sectors); + scrub_write_sectors(sctx, cur_stripe, repaired, false); + } + } + /* Wait for above writeback to finish. */ + for (int cur_mirror = start_stripe; cur_mirror < start_stripe + ncopies; + cur_mirror++) { + cur_stripe = &sctx->stripes[cur_mirror]; + + wait_scrub_stripe_io(cur_stripe); + } +} + +static int handle_logical_stripes(struct scrub_ctx *sctx, + struct btrfs_block_group *bg) +{ + const int nr_stripes = sctx->cur_stripe; + const int group_stripes = btrfs_bg_type_to_factor(bg->flags); + struct scrub_stripe *stripe; + + for (int i = 0; i < nr_stripes; i++) { + stripe = &sctx->stripes[i]; + + scrub_submit_initial_read(sctx, stripe); + } + for (int i = 0; i < nr_stripes; i++) { + stripe = &sctx->stripes[i]; + + wait_scrub_stripe_io(stripe); + + /* Save the initial failed bitmap for later repair and report usage. */ + stripe->init_error_bitmap = stripe->error_bitmap; + stripe->init_nr_io_errors = + bitmap_weight(&stripe->io_error_bitmap, stripe->nr_sectors); + stripe->init_nr_csum_errors = + bitmap_weight(&stripe->csum_error_bitmap, stripe->nr_sectors); + stripe->init_nr_meta_errors = + bitmap_weight(&stripe->meta_error_bitmap, stripe->nr_sectors); + } + + for (int i = 0; i < nr_stripes; i += group_stripes) + repair_one_mirror_group(sctx, i, group_stripes); + sctx->cur_stripe = 0; + + return 0; +} + +static int flush_scrub_stripes(struct scrub_ctx *sctx, + struct btrfs_block_group *bg) { struct btrfs_fs_info *fs_info = sctx->fs_info; struct scrub_stripe *stripe; @@ -1666,6 +1792,9 @@ static int flush_scrub_stripes(struct scrub_ctx *sctx) if (!nr_stripes) return 0; + if (sctx->scrub_logical) + return handle_logical_stripes(sctx, bg); + ASSERT(test_bit(SCRUB_STRIPE_FLAG_INITIALIZED, &sctx->stripes[0].state)); scrub_throttle_dev_io(sctx, sctx->stripes[0].dev, @@ -1762,7 +1891,7 @@ static int queue_scrub_stripe(struct scrub_ctx *sctx, struct btrfs_block_group * /* No available slot, submit all stripes and wait for them. */ if (sctx->cur_stripe >= sctx->nr_stripes) { - ret = flush_scrub_stripes(sctx); + ret = flush_scrub_stripes(sctx, bg); if (ret < 0) return ret; } @@ -2250,7 +2379,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, break; } out: - ret2 = flush_scrub_stripes(sctx); + ret2 = flush_scrub_stripes(sctx, bg); if (!ret) ret = ret2; if (sctx->raid56_data_stripes) { @@ -3025,8 +3154,7 @@ static void scrub_copy_stripe(const struct scrub_stripe *src, static int queue_scrub_logical_stripes(struct scrub_ctx *sctx, struct btrfs_block_group *bg, u64 logical) { - const int raid_index = btrfs_bg_flags_to_raid_index(bg->flags); - const int nr_copies = btrfs_raid_array[raid_index].ncopies; + const int nr_copies = btrfs_bg_type_to_factor(bg->flags); const int old_cur = sctx->cur_stripe; struct scrub_stripe *stripe; int ret; @@ -3035,7 +3163,7 @@ static int queue_scrub_logical_stripes(struct scrub_ctx *sctx, ASSERT(sctx->nr_stripes); if (sctx->cur_stripe + nr_copies > sctx->nr_stripes) { - ret = flush_scrub_stripes(sctx); + ret = flush_scrub_stripes(sctx, bg); if (ret < 0) return ret; } @@ -3133,7 +3261,7 @@ static int scrub_logical_one_chunk(struct scrub_ctx *sctx, cur = sctx->stripes[sctx->cur_stripe - 1].logical + BTRFS_STRIPE_LEN; } out: - flush_ret = flush_scrub_stripes(sctx); + flush_ret = flush_scrub_stripes(sctx, bg); if (!ret) ret = flush_ret; free_scrub_stripes(sctx); From patchwork Mon Jul 3 07:32:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13299667 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E235BC0015E for ; Mon, 3 Jul 2023 07:33:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230499AbjGCHdX (ORCPT ); Mon, 3 Jul 2023 03:33:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230514AbjGCHdT (ORCPT ); Mon, 3 Jul 2023 03:33:19 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 523D1E54 for ; Mon, 3 Jul 2023 00:33:15 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 5B1BC1FD72 for ; Mon, 3 Jul 2023 07:33:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1688369591; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vTUdZAf5MDy7R5LMoW/1oGwvHe6e7H3aRxC+YL6Ytao=; b=XR1tMtcmn5r556rsQEiAODWPXR3zPOYDFUfWwyGo6PnGGwV/nPEQMIGJdxuCXREe0cJC8k yxvtL9d/qZ7Z+LegNR22op2uf4qg+AlUGzk9vqXcCguBikN2WwaFtevYVad7/IHEwh+IIo urSmxyyYp5e2V22xAauER6F3vJJSg8U= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B0FCE13276 for ; Mon, 3 Jul 2023 07:33:10 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id ONlDHrZ5omRoVQAAMHmgww (envelope-from ) for ; Mon, 03 Jul 2023 07:33:10 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 12/14] btrfs: scrub: add RAID56 support for queue_scrub_logical_stripes() Date: Mon, 3 Jul 2023 15:32:36 +0800 Message-ID: <9e1666311b1e944fd666e97f77af59b198717676.1688368617.git.wqu@suse.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org There are several factors that needs to be considered for RAID56 full stripe: - Empty data stripes We still need those empty data stripes, as they may be involved in the P/Q recovery. - P/Q stripes We need special handling for P/Q stripes. So this patch introduce a new helper, queue_raid56_full_stripes() to handle those special requirements. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 108 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 105 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 21b3fa55d0f0..f37b9fd30595 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -105,7 +105,7 @@ struct scrub_stripe { u64 logical; u64 physical; - u16 mirror_num; + s16 mirror_num; /* Should be BTRFS_STRIPE_LEN / sectorsize. */ u16 nr_sectors; @@ -3147,15 +3147,109 @@ static void scrub_copy_stripe(const struct scrub_stripe *src, set_bit(SCRUB_STRIPE_FLAG_INITIALIZED, &dst->state); } +static int queue_raid56_full_stripes(struct scrub_ctx *sctx, + struct btrfs_block_group *bg, struct map_lookup *map, + u64 logical) +{ + const int num_stripes = map->num_stripes; + const int old_cur = sctx->cur_stripe; + const int data_stripes = nr_data_stripes(map); + struct btrfs_fs_info *fs_info = sctx->fs_info; + struct btrfs_root *extent_root = btrfs_extent_root(fs_info, bg->start); + struct btrfs_path path = { 0 }; + unsigned long extents_bitmap = 0; + u64 extent_start; + u64 extent_len; + u64 extent_flags; + u64 extent_gen; + u64 full_stripe_start; + u32 full_stripe_nr; + int ret; + + path.search_commit_root = 1; + path.skip_locking = 1; + + ret = find_first_extent_item(extent_root, &path, logical, + bg->start + bg->length - logical); + /* Error or no more extent found. */ + if (ret) + return ret; + get_extent_info(&path, &extent_start, &extent_len, &extent_flags, + &extent_gen); + btrfs_release_path(&path); + + full_stripe_nr = rounddown((extent_start - bg->start) >> + BTRFS_STRIPE_LEN_SHIFT, data_stripes); + full_stripe_start = btrfs_stripe_nr_to_offset(full_stripe_nr) + bg->start; + + for (int i = 0; i < data_stripes; i++) { + struct scrub_stripe *stripe = &sctx->stripes[old_cur + i]; + u64 cur = full_stripe_start + btrfs_stripe_nr_to_offset(i); + + scrub_reset_stripe(stripe); + ret = scrub_find_fill_first_stripe(bg, NULL, 0, 1, cur, + BTRFS_STRIPE_LEN, stripe); + if (ret < 0) + return ret; + /* + * No extent in the data stripe, we need to fill the info + * manually. + */ + if (ret > 0) { + stripe->logical = cur; + stripe->physical = 0; + stripe->dev = NULL; + stripe->bg = bg; + stripe->mirror_num = 1; + set_bit(SCRUB_STRIPE_FLAG_INITIALIZED, &stripe->state); + ret = 0; + } + bitmap_or(&extents_bitmap, &extents_bitmap, + &stripe->extent_sector_bitmap, stripe->nr_sectors); + } + + /* Manually fill the P/Q stripes */ + for (int i = 0; i < num_stripes - data_stripes; i++) { + struct scrub_stripe *stripe = + &sctx->stripes[old_cur + data_stripes + i]; + scrub_reset_stripe(stripe); + /* + * Use the logical of the last data stripes, so the caller + * knows exactly where the next logical should start, no matter + * if it's RAID56 or not. + */ + stripe->logical = full_stripe_start + + btrfs_stripe_nr_to_offset(data_stripes - 1); + stripe->dev = NULL; + stripe->bg = bg; + stripe->mirror_num = -1 - i; + bitmap_copy(&stripe->extent_sector_bitmap, &extents_bitmap, + stripe->nr_sectors); + /* + * For parity stripes, they don't need any csum checks, + * so mark them as nocsum data. + */ + for (int sector = 0; sector < stripe->nr_sectors; sector++) { + stripe->sectors[sector].is_metadata = false; + stripe->sectors[sector].csum = NULL; + } + set_bit(SCRUB_STRIPE_FLAG_INITIALIZED, &stripe->state); + } + sctx->cur_stripe += num_stripes; + return 0; +} + /* * Unlike queue_scrub_stripe() which only queue one stripe, this queue all * mirrors for non-RAID56 profiles, or the full stripe for RAID56. */ static int queue_scrub_logical_stripes(struct scrub_ctx *sctx, - struct btrfs_block_group *bg, u64 logical) + struct btrfs_block_group *bg, struct map_lookup *map, + u64 logical) { const int nr_copies = btrfs_bg_type_to_factor(bg->flags); const int old_cur = sctx->cur_stripe; + bool is_raid56 = bg->flags & BTRFS_BLOCK_GROUP_RAID56_MASK; struct scrub_stripe *stripe; int ret; @@ -3168,6 +3262,14 @@ static int queue_scrub_logical_stripes(struct scrub_ctx *sctx, return ret; } + /* + * For RAID56, we need to queue the full stripe. If there is some data + * stripes containing no sectors, we still need to queue those empty + * stripes for later recovery and P/Q verification. + */ + if (is_raid56) + return queue_raid56_full_stripes(sctx, bg, map, logical); + stripe = &sctx->stripes[old_cur]; scrub_reset_stripe(stripe); @@ -3250,7 +3352,7 @@ static int scrub_logical_one_chunk(struct scrub_ctx *sctx, } spin_unlock(&bg->lock); - ret = queue_scrub_logical_stripes(sctx, bg, cur); + ret = queue_scrub_logical_stripes(sctx, bg, em->map_lookup, cur); if (ret > 0) { ret = 0; break; From patchwork Mon Jul 3 07:32:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13299672 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96012C0015E for ; Mon, 3 Jul 2023 07:33:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230523AbjGCHd1 (ORCPT ); Mon, 3 Jul 2023 03:33:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35548 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230502AbjGCHdT (ORCPT ); Mon, 3 Jul 2023 03:33:19 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A3456E44 for ; Mon, 3 Jul 2023 00:33:13 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 64504218FB for ; Mon, 3 Jul 2023 07:33:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1688369592; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vP14oweh2ISZxU/9Pxuc8fnIrwr+WC2bF6ODfZ6142E=; b=gRLZkI1zVAWWwVGZq6YZcEZDuED3AE0+U5NTiXUdAyJZiKILTbiPj7ZeWxPlD3Wv5zLb+/ 9+tO/cQHd6aGMKqnstqvdYi9e8+wLp5jjXRpzoULR5ZSB0L2KCYSUzAGp/4qtB1e3VCjiX SDXMRyGbplSolAfS0u7y5BSVjZXzsnc= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id BB94513276 for ; Mon, 3 Jul 2023 07:33:11 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id wAPAIbd5omRoVQAAMHmgww (envelope-from ) for ; Mon, 03 Jul 2023 07:33:11 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 13/14] btrfs: scrub: introduce the RAID56 data recovery path for scrub logical Date: Mon, 3 Jul 2023 15:32:37 +0800 Message-ID: X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The recovery path would use cached stripe contents to make sure we won't waste IO on re-read the same contents from disks. This is done by reusing raid56_parity_cache_pages() function to cache data and P/Q stripes. And since we need to call raid56_parity_cache_pages(), this means we can not go the same btrfs_submit_bio() path for the repair, thus we introduce the following two helpers: - submit_cached_recover_bio() This would do the all necessary preparation: grab a bioc, rbio, then fill the cache and finally submit and wait the bio. - scrub_submit_cached_raid56_repair_read() This is doing mostly the same work as scrub_stripe_submit_repair_read(), but since we can not go btrfs_submit_bio(), we have to go a different path. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 244 +++++++++++++++++++++++++++++++++++++++++++-- fs/btrfs/volumes.c | 20 +--- fs/btrfs/volumes.h | 16 +++ 3 files changed, 253 insertions(+), 27 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index f37b9fd30595..4ca502fa0b40 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -174,6 +174,7 @@ struct scrub_stripe { struct scrub_ctx { struct scrub_stripe *stripes; struct scrub_stripe *raid56_data_stripes; + struct map_lookup *map; struct btrfs_fs_info *fs_info; int first_free; int cur_stripe; @@ -846,6 +847,130 @@ static void scrub_stripe_submit_repair_read(struct scrub_stripe *stripe, } } +static void raid56_scrub_wait_endio(struct bio *bio) +{ + complete(bio->bi_private); +} + +/* + * The read repair path of scrub_logical. + * + * This would use the full stripe cache as cache, so we can afford to do + * synchronous repair here, as everything is done inside memory. + */ +static void submit_cached_recover_bio(struct scrub_stripe *first_stripe, + int group_stripes, + struct scrub_stripe *stripe, + struct bio *bio, + int mirror) +{ + DECLARE_COMPLETION_ONSTACK(io_done); + struct btrfs_fs_info *fs_info = first_stripe->bg->fs_info; + struct btrfs_io_context *bioc = NULL; + struct btrfs_raid_bio *rbio; + u64 bio_logical = bio->bi_iter.bi_sector << SECTOR_SHIFT; + u32 bio_size = bio->bi_iter.bi_size; + u64 length = fs_info->sectorsize; + unsigned long bio_bitmap = 0; + int sector_start = calc_sector_number(stripe, bio_first_bvec_all(bio)); + int ret; + + ASSERT(mirror > 1); + ASSERT(bio_size); + ASSERT(bio_logical >= first_stripe->logical); + + bio->bi_private = &io_done; + bio->bi_end_io = raid56_scrub_wait_endio; + btrfs_bio_counter_inc_blocked(fs_info); + ret = btrfs_map_block(fs_info, BTRFS_MAP_WRITE, first_stripe->logical, + &length, &bioc, NULL, 0, 1); + if (ret < 0) + goto out; + + rbio = raid56_parity_alloc_recover_rbio(bio, bioc, mirror); + btrfs_put_bioc(bioc); + if (!rbio) { + ret = -ENOMEM; + goto out; + } + + for (int i = 0; i < group_stripes; i++) { + struct scrub_stripe *stripe = first_stripe + i; + + raid56_parity_cache_pages(rbio, stripe->pages, i); + } + raid56_parity_submit_recover_rbio(rbio); + wait_for_completion_io(&io_done); + ret = blk_status_to_errno(bio->bi_status); +out: + btrfs_bio_counter_dec(fs_info); + if (ret < 0) { + bitmap_set(&stripe->io_error_bitmap, sector_start, + bio_size >> fs_info->sectorsize_bits); + bitmap_set(&stripe->error_bitmap, sector_start, + bio_size >> fs_info->sectorsize_bits); + return; + } + bitmap_set(&bio_bitmap, sector_start, + bio_size >> fs_info->sectorsize_bits); + /* Verify the repaired sector. */ + scrub_verify_one_stripe(stripe, bio_bitmap); +} + +static void scrub_submit_cached_raid56_repair_read( + struct scrub_stripe *first_stripe, + int group_stripes, + struct scrub_stripe *stripe, + int mirror) +{ + struct btrfs_fs_info *fs_info = stripe->bg->fs_info; + struct bio *bio = NULL; + int i; + + ASSERT(stripe->mirror_num >= 1); + ASSERT(atomic_read(&stripe->pending_io) == 0); + + /* + * Here we will repair every sector even if it's not covered by an + * extent. + * This is to handle missing device, so that we won't re-generate + * P/Q using garbage. + */ + for_each_set_bit(i, &stripe->error_bitmap, stripe->nr_sectors) { + struct page *page; + int pgoff; + int ret; + + page = scrub_stripe_get_page(stripe, i); + pgoff = scrub_stripe_get_page_offset(stripe, i); + + /* The current sector cannot be merged, submit the bio. */ + if (bio && i > 0 && !test_bit(i - 1, &stripe->error_bitmap)) { + ASSERT(bio->bi_iter.bi_size); + submit_cached_recover_bio(first_stripe, group_stripes, + stripe, bio, mirror); + bio_put(bio); + bio = NULL; + } + + if (!bio) { + bio = bio_alloc(NULL, stripe->nr_sectors, REQ_OP_READ, + GFP_NOFS); + bio->bi_iter.bi_sector = (stripe->logical + + (i << fs_info->sectorsize_bits)) >> SECTOR_SHIFT; + } + + ret = bio_add_page(bio, page, fs_info->sectorsize, pgoff); + ASSERT(ret == fs_info->sectorsize); + } + if (bio) { + ASSERT(bio->bi_iter.bi_size); + submit_cached_recover_bio(first_stripe, group_stripes, stripe, + bio, mirror); + bio_put(bio); + } +} + static void scrub_stripe_report_errors(struct scrub_ctx *sctx, struct scrub_stripe *stripe) { @@ -1657,6 +1782,94 @@ static bool stripe_has_metadata_error(struct scrub_stripe *stripe) return false; } +static int repair_raid56_full_stripe(struct scrub_ctx *sctx, int start_stripe, + int group_stripes) +{ + const int tolerance = btrfs_map_num_copies(sctx->map); + struct scrub_stripe *first_stripe = &sctx->stripes[start_stripe]; + struct scrub_stripe *p_stripe; + unsigned long extents_bitmap; + int data_stripes = nr_data_stripes(sctx->map); + bool has_error = false; + int ret = 0; + + ASSERT(start_stripe + group_stripes <= sctx->cur_stripe); + ASSERT(sctx->map); + + /* Zoned fs doesn't support RAID56 yet. */ + ASSERT(!btrfs_is_zoned(sctx->fs_info)); + + p_stripe = first_stripe + data_stripes; + + bitmap_copy(&extents_bitmap, &p_stripe->extent_sector_bitmap, + p_stripe->nr_sectors); + + for (int stripe_nr = 0; stripe_nr < data_stripes; stripe_nr++) { + struct scrub_stripe *stripe = first_stripe + stripe_nr; + unsigned long final_errors = 0; + + for (int mirror = 2; mirror <= tolerance; mirror++) { + scrub_submit_cached_raid56_repair_read(first_stripe, + group_stripes, stripe, mirror); + } + bitmap_and(&final_errors, &stripe->error_bitmap, + &stripe->extent_sector_bitmap, stripe->nr_sectors); + if (!bitmap_empty(&final_errors, stripe->nr_sectors)) { + btrfs_err_rl(sctx->fs_info, +"unrepaired sector found, full_stripe %llu stripe_num %u error_sectors %*pbl", + first_stripe->logical, stripe_nr, + stripe->nr_sectors, &final_errors); + has_error = true; + } + /* + * Here we continue repairing the remaining data stripes, + * This is to update the accounting to have a correct view + * of the errors we have. + */ + } + if (has_error) { + btrfs_err(sctx->fs_info, "failed to repair full stripe %llu", + first_stripe->logical); + ret = -EIO; + goto report; + } + + /* Submit the write for repaired data sectors. */ + if (!sctx->readonly) { + for (int stripe_nr = 0; stripe_nr < data_stripes; stripe_nr++) { + struct scrub_stripe *stripe = first_stripe + stripe_nr; + unsigned long repaired; + + bitmap_andnot(&repaired, &stripe->init_error_bitmap, + &stripe->error_bitmap, + stripe->nr_sectors); + scrub_write_sectors(sctx, stripe, repaired, false); + } + } + /* Wait for above writeback to finish. */ + for (int stripe_nr = 0; stripe_nr < data_stripes; stripe_nr++) { + struct scrub_stripe *stripe = first_stripe + stripe_nr; + + wait_scrub_stripe_io(stripe); + } + + /* Still need to scrub P/Q stripes. */ + ret = -EOPNOTSUPP; +report: + /* + * Update the accounting for data stripes. + * + * Unfortunately the current scrub structure doesn't have the extra + * members to report errors about P/Q sectors. + */ + for (int stripe_nr = 0; stripe_nr < data_stripes; stripe_nr++) { + struct scrub_stripe *stripe = first_stripe + stripe_nr; + + scrub_stripe_report_errors(sctx, stripe); + } + return ret; +} + /* * Unlike the per-device repair, we have all mirrors read out already. * @@ -1750,8 +1963,14 @@ static int handle_logical_stripes(struct scrub_ctx *sctx, struct btrfs_block_group *bg) { const int nr_stripes = sctx->cur_stripe; - const int group_stripes = btrfs_bg_type_to_factor(bg->flags); + const bool is_raid56 = bg->flags & BTRFS_BLOCK_GROUP_RAID56_MASK; + const int group_stripes = is_raid56 ? sctx->map->num_stripes : + btrfs_bg_type_to_factor(bg->flags); struct scrub_stripe *stripe; + int ret = 0; + + /* For scrub_logical, sctx->map should be set. */ + ASSERT(sctx->map); for (int i = 0; i < nr_stripes; i++) { stripe = &sctx->stripes[i]; @@ -1773,11 +1992,21 @@ static int handle_logical_stripes(struct scrub_ctx *sctx, bitmap_weight(&stripe->meta_error_bitmap, stripe->nr_sectors); } - for (int i = 0; i < nr_stripes; i += group_stripes) - repair_one_mirror_group(sctx, i, group_stripes); + + for (int i = 0; i < nr_stripes; i += group_stripes) { + int tmp; + + if (is_raid56) { + tmp = repair_raid56_full_stripe(sctx, i, group_stripes); + if (!ret) + ret = tmp; + } else { + repair_one_mirror_group(sctx, i, group_stripes); + } + } sctx->cur_stripe = 0; - return 0; + return ret; } static int flush_scrub_stripes(struct scrub_ctx *sctx, @@ -1874,11 +2103,6 @@ static int flush_scrub_stripes(struct scrub_ctx *sctx, return ret; } -static void raid56_scrub_wait_endio(struct bio *bio) -{ - complete(bio->bi_private); -} - static int queue_scrub_stripe(struct scrub_ctx *sctx, struct btrfs_block_group *bg, struct btrfs_device *dev, int mirror_num, u64 logical, u32 length, u64 physical) @@ -3317,6 +3541,7 @@ static int scrub_logical_one_chunk(struct scrub_ctx *sctx, spin_unlock(&bg->lock); return ret; } + sctx->map = em->map_lookup; scrub_blocked_if_needed(fs_info); @@ -3368,6 +3593,7 @@ static int scrub_logical_one_chunk(struct scrub_ctx *sctx, ret = flush_ret; free_scrub_stripes(sctx); free_extent_map(em); + sctx->map = NULL; return ret; } diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index c22007bd830b..1f79180dab83 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5752,22 +5752,6 @@ void btrfs_mapping_tree_free(struct extent_map_tree *tree) } } -static int map_num_copies(const struct map_lookup *map) -{ - if (map->type & BTRFS_BLOCK_GROUP_RAID5) - return 2; - if (map->type & BTRFS_BLOCK_GROUP_RAID6) - /* - * There could be two corrupted data stripes, we need - * to loop retry in order to rebuild the correct data. - * - * Fail a stripe at a time on every retry except the - * stripe under reconstruction. - */ - return map->num_stripes; - return btrfs_bg_type_to_factor(map->type); -} - int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len) { struct extent_map *em; @@ -5783,7 +5767,7 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len) */ return 1; - ret = map_num_copies(em->map_lookup); + ret = btrfs_map_num_copies(em->map_lookup); free_extent_map(em); return ret; } @@ -6294,7 +6278,7 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, return PTR_ERR(em); map = em->map_lookup; - if (mirror_num > map_num_copies(map)) { + if (mirror_num > btrfs_map_num_copies(map)) { ret = -EINVAL; goto out_free_em; } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index f4f410550306..0753675a2c5b 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -752,4 +752,20 @@ bool btrfs_repair_one_zone(struct btrfs_fs_info *fs_info, u64 logical); bool btrfs_pinned_by_swapfile(struct btrfs_fs_info *fs_info, void *ptr); +static inline int btrfs_map_num_copies(struct map_lookup *map) +{ + if (map->type & BTRFS_BLOCK_GROUP_RAID5) + return 2; + if (map->type & BTRFS_BLOCK_GROUP_RAID6) + /* + * There could be two corrupted data stripes, we need + * to loop retry in order to rebuild the correct data. + * + * Fail a stripe at a time on every retry except the + * stripe under reconstruction. + */ + return map->num_stripes; + return btrfs_bg_type_to_factor(map->type); +} + #endif From patchwork Mon Jul 3 07:32:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qu Wenruo X-Patchwork-Id: 13299666 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42388C001DD for ; Mon, 3 Jul 2023 07:33:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230476AbjGCHdX (ORCPT ); Mon, 3 Jul 2023 03:33:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230506AbjGCHdT (ORCPT ); Mon, 3 Jul 2023 03:33:19 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AE292E52 for ; Mon, 3 Jul 2023 00:33:14 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 70766218FC for ; Mon, 3 Jul 2023 07:33:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1688369593; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eO1iej0py8goBFlIIh/KF0ANhgOXs2kxz+dDsLd/rEY=; b=qUmAn28FaJuzGUb7XZ6ScSDBqvomgVVUYgNPM1q/8l3lfDnawzIcBtwLsvemjSUQxjPJg5 xpifD9mLwrkDg9md+c+th1A3Nf/qKU7A6Y+65YFxUUDKY19O9kc72hY+kl0UFbebp8z4uP hHUMKNRhapIzcfdQm914yEIOgzQ/VJk= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id C5A1613276 for ; Mon, 3 Jul 2023 07:33:12 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id uDyII7h5omRoVQAAMHmgww (envelope-from ) for ; Mon, 03 Jul 2023 07:33:12 +0000 From: Qu Wenruo To: linux-btrfs@vger.kernel.org Subject: [PATCH 14/14] btrfs: scrub: implement the RAID56 P/Q scrub code Date: Mon, 3 Jul 2023 15:32:38 +0800 Message-ID: <8538a4b8c0f2cfa762863ae735a774f903090acb.1688368617.git.wqu@suse.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org For RAID56 P/Q stripe verification, we already have an existing path inside scrub_raid56_parity_stripe(). Although the existing path is only to verify one P/Q stripe, with recent changes to scrub_rbio(), we only need to pass a NULL @scrub_dev to raid56_parity_alloc_scrub_rbio(), then we can verify both P/Q stripes in one go. So here we introduce a helper, submit_raid56_pq_scrub_bio(), to do the necessary work of P/Q stripes verification and repair. The new helper would be utilized by both scrub_logical and the existing per-device scrub, and both would use the repaired data stripes to reduce IO for RAID56 scrub. Signed-off-by: Qu Wenruo --- fs/btrfs/scrub.c | 106 +++++++++++++++++++++++++++-------------------- 1 file changed, 62 insertions(+), 44 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 4ca502fa0b40..2a882a07b388 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -1782,6 +1782,58 @@ static bool stripe_has_metadata_error(struct scrub_stripe *stripe) return false; } +static int submit_raid56_pq_scrub_bio(struct scrub_ctx *sctx, + struct map_lookup *map, + struct scrub_stripe *first_data_stripe, + int group_stripes, + struct btrfs_device *scrub_dev, + unsigned long extent_bitmap) +{ + DECLARE_COMPLETION_ONSTACK(io_done); + struct btrfs_fs_info *fs_info = sctx->fs_info; + struct btrfs_io_context *bioc = NULL; + struct btrfs_raid_bio *rbio; + struct bio *bio; + const int data_stripes = nr_data_stripes(map); + u64 full_stripe_start = first_data_stripe->logical; + u64 length = btrfs_stripe_nr_to_offset(data_stripes); + int ret; + + bio = bio_alloc(NULL, 1, REQ_OP_READ, GFP_NOFS); + bio->bi_iter.bi_sector = full_stripe_start >> SECTOR_SHIFT; + bio->bi_private = &io_done; + bio->bi_end_io = raid56_scrub_wait_endio; + + btrfs_bio_counter_inc_blocked(fs_info); + ret = btrfs_map_block(fs_info, BTRFS_MAP_WRITE, full_stripe_start, + &length, &bioc, NULL, NULL, 1); + if (ret < 0) { + btrfs_put_bioc(bioc); + btrfs_bio_counter_dec(fs_info); + return ret; + } + rbio = raid56_parity_alloc_scrub_rbio(bio, bioc, scrub_dev, &extent_bitmap); + btrfs_put_bioc(bioc); + if (!rbio) { + ret = -ENOMEM; + btrfs_bio_counter_dec(fs_info); + return ret; + } + /* Use the recovered stripes as cache to avoid read them from disk again. */ + ASSERT(group_stripes <= map->num_stripes); + for (int i = 0; i < group_stripes; i++) { + struct scrub_stripe *stripe = first_data_stripe + i; + + raid56_parity_cache_pages(rbio, stripe->pages, i); + } + raid56_parity_submit_scrub_rbio(rbio); + wait_for_completion_io(&io_done); + ret = blk_status_to_errno(bio->bi_status); + bio_put(bio); + btrfs_bio_counter_dec(fs_info); + return 0; +} + static int repair_raid56_full_stripe(struct scrub_ctx *sctx, int start_stripe, int group_stripes) { @@ -1853,8 +1905,14 @@ static int repair_raid56_full_stripe(struct scrub_ctx *sctx, int start_stripe, wait_scrub_stripe_io(stripe); } - /* Still need to scrub P/Q stripes. */ - ret = -EOPNOTSUPP; + /* + * All data stripes repaired, now can generate and verify P/Q stripes. + * + * Pass NULL as @scrub_dev, so that both P/Q stripes would be verified. + */ + ret = submit_raid56_pq_scrub_bio(sctx, sctx->map, first_stripe, + sctx->map->num_stripes, NULL, + extents_bitmap); report: /* * Update the accounting for data stripes. @@ -2141,14 +2199,10 @@ static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx, { DECLARE_COMPLETION_ONSTACK(io_done); struct btrfs_fs_info *fs_info = sctx->fs_info; - struct btrfs_raid_bio *rbio; - struct btrfs_io_context *bioc = NULL; - struct bio *bio; struct scrub_stripe *stripe; bool all_empty = true; const int data_stripes = nr_data_stripes(map); unsigned long extent_bitmap = 0; - u64 length = btrfs_stripe_nr_to_offset(data_stripes); int ret; ASSERT(sctx->raid56_data_stripes); @@ -2261,38 +2315,8 @@ static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx, } /* Now we can check and regenerate the P/Q stripe. */ - bio = bio_alloc(NULL, 1, REQ_OP_READ, GFP_NOFS); - bio->bi_iter.bi_sector = full_stripe_start >> SECTOR_SHIFT; - bio->bi_private = &io_done; - bio->bi_end_io = raid56_scrub_wait_endio; - - btrfs_bio_counter_inc_blocked(fs_info); - ret = btrfs_map_block(fs_info, BTRFS_MAP_WRITE, full_stripe_start, - &length, &bioc, NULL, NULL, 1); - if (ret < 0) { - btrfs_put_bioc(bioc); - btrfs_bio_counter_dec(fs_info); - goto out; - } - rbio = raid56_parity_alloc_scrub_rbio(bio, bioc, scrub_dev, &extent_bitmap); - btrfs_put_bioc(bioc); - if (!rbio) { - ret = -ENOMEM; - btrfs_bio_counter_dec(fs_info); - goto out; - } - /* Use the recovered stripes as cache to avoid read them from disk again. */ - for (int i = 0; i < data_stripes; i++) { - stripe = &sctx->raid56_data_stripes[i]; - - raid56_parity_cache_pages(rbio, stripe->pages, i); - } - raid56_parity_submit_scrub_rbio(rbio); - wait_for_completion_io(&io_done); - ret = blk_status_to_errno(bio->bi_status); - bio_put(bio); - btrfs_bio_counter_dec(fs_info); - + ret = submit_raid56_pq_scrub_bio(sctx, map, sctx->raid56_data_stripes, + data_stripes, scrub_dev, extent_bitmap); out: return ret; } @@ -3545,12 +3569,6 @@ static int scrub_logical_one_chunk(struct scrub_ctx *sctx, scrub_blocked_if_needed(fs_info); - /* RAID56 not yet supported. */ - if (bg->flags & BTRFS_BLOCK_GROUP_RAID56_MASK) { - ret = -EOPNOTSUPP; - goto out; - } - nr_stripes = SCRUB_STRIPES_PER_SCTX * nr_copies; ret = alloc_scrub_stripes(sctx, nr_stripes); if (ret < 0)