[00/14] btrfs: scrub: introduce SCRUB_LOGICAL flag

Message ID	cover.1688368617.git.wqu@suse.com (mailing list archive)
Headers	show Return-Path: <linux-btrfs-owner@vger.kernel.org> From: Qu Wenruo <wqu@suse.com> To: linux-btrfs@vger.kernel.org Subject: [PATCH 00/14] btrfs: scrub: introduce SCRUB_LOGICAL flag Date: Mon, 3 Jul 2023 15:32:24 +0800 Message-ID: <cover.1688368617.git.wqu@suse.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	btrfs: scrub: introduce SCRUB_LOGICAL flag \| expand [00/14] btrfs: scrub: introduce SCRUB_LOGICAL flag [01/14] btrfs: raid56: remove unnecessary parameter for raid56_parity_alloc_scrub_rbio() [02/14] btrfs: raid56: allow scrub operation to update both P and Q stripes [03/14] btrfs: raid56: allow caching P/Q stripes [04/14] btrfs: raid56: add the interfaces to submit recovery rbio [05/14] btrfs: add the ability to read P/Q stripes directly [06/14] btrfs: scrub: make scrub_ctx::stripes dynamically allocated [07/14] btrfs: scrub: introduce the skeleton for logical-scrub [08/14] btrfs: scrub: extract the common preparation before scrubbing a block group [09/14] btrfs: scrub: implement the chunk iteration code for scrub_logical [10/14] btrfs: scrub: implement the basic extent iteration code [11/14] btrfs: scrub: implement the repair part of scrub logical [12/14] btrfs: scrub: add RAID56 support for queue_scrub_logical_stripes() [13/14] btrfs: scrub: introduce the RAID56 data recovery path for scrub logical [14/14] btrfs: scrub: implement the RAID56 P/Q scrub code

Message ID

cover.1688368617.git.wqu@suse.com (mailing list archive)

Headers

From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: [PATCH 00/14] btrfs: scrub: introduce SCRUB_LOGICAL flag
Date: Mon,  3 Jul 2023 15:32:24 +0800
Message-ID: <cover.1688368617.git.wqu@suse.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

btrfs: scrub: introduce SCRUB_LOGICAL flag | expand

Message

Qu Wenruo July 3, 2023, 7:32 a.m. UTC

[CHANGELOG]
RFC->v1:
- Add RAID56 support
  Which is the biggest advantage of the new scrub mode.

- More basic tests around various repair situations

[REPO]
Please fetch from github repo:
https://github.com/adam900710/linux/tree/scrub_logical

This is based on David's misc-next, but still has one extra regression
fix which is not yet merged. ("btrfs: raid56: always verify the P/Q
contents for scrub")

[BACKGROUND]
Scrub originally works in a per-device basis, that means if we want to
scrub the full fs, we would queue a scrub for each writeable device.

This is normally fine, but some behavior is not ideal like the
following:
		X	X+16K	X+32K
 Mirror 1	|XXXXXXX|       |
 Mirror 2	|       |XXXXXXX|

When scrubbing mirror 1, we found [X, X+16K) has corruption, then we
will try to read from mirror 2 and repair using the correct data.

This applies to range [X+16K, X+32K) of mirror 2, causing the good copy
to be read twice, which may or may not be a big deal.

But the problem can easily go crazy for RAID56, as its repair requires
the full stripe to be read out, so is its P/Q verification, e.g.:

		X	X+16K	X+32K
 Data stripe 1	|XXXXXXX|       |	|	|
 Data stripe 2	|       |XXXXXXX|	|	|
 Parity stripe	|       |	|XXXXXXX|	|

In above case, if we're scrubbing all mirrors, we would read the same
contents again and again:

Scrub on mirror 1:
- Read data stripe 1 for the initial read.
- Read data stripes 1 + 2 + parity for the rebuild.

Scrub on mirror 2:
- Read data stripe 2 for the initial read.
- Read data stripes 1 + 2 + parity for the rebuild.

Scrub on Parity
- Read data stripes 1 + 2 for the data stripes verification
- Read data stripes 1 + 2 + parity for the data rebuild
  This step is already improved by recent optimization to use cached
  stripes.
- Read the parity stripe for the final verification

So for data stripes, they are read *five* times, and *three* times for
parity stripes.

[ENHANCEMENT]
Instead of the existing per-device scrub, this patchset introduce a new
flag, SCRUB_LOGICAL, to do logical address space based scrub.

Unlike per-device scrub, this new flag would make scrub a per-fs
operation.

This allows us to scrub the different mirrors in one go, and avoid
unnecessary read to repair the corruption.

This means, no matter what profile, they always read the same data just
once.

This also makes user space handling much simpler, just one ioctl to
scrub the full fs, without the current multi-thread scrub code.

[TODO]
- More testing
  Currently only done functional tests, no stress tests yet.

- Better progs integration
  In theory we can silently try SCRUB_LOGICAL first, if the kernel
  doesn't support it, then fallback to the old multi-device scrub.

  But for current testing, a dedicated option is still assigned for
  scrub subcommand.

  And currently it only supports full report, no summary nor scrub file
  support.

Qu Wenruo (14):
  btrfs: raid56: remove unnecessary parameter for
    raid56_parity_alloc_scrub_rbio()
  btrfs: raid56: allow scrub operation to update both P and Q stripes
  btrfs: raid56: allow caching P/Q stripes
  btrfs: raid56: add the interfaces to submit recovery rbio
  btrfs: add the ability to read P/Q stripes directly
  btrfs: scrub: make scrub_ctx::stripes dynamically allocated
  btrfs: scrub: introduce the skeleton for logical-scrub
  btrfs: scrub: extract the common preparation before scrubbing a block
    group
  btrfs: scrub: implement the chunk iteration code for scrub_logical
  btrfs: scrub: implement the basic extent iteration code
  btrfs: scrub: implement the repair part of scrub logical
  btrfs: scrub: add RAID56 support for queue_scrub_logical_stripes()
  btrfs: scrub: introduce the RAID56 data recovery path for scrub
    logical
  btrfs: scrub: implement the RAID56 P/Q scrub code

 fs/btrfs/disk-io.c         |    1 +
 fs/btrfs/fs.h              |   12 +
 fs/btrfs/ioctl.c           |    6 +-
 fs/btrfs/raid56.c          |  134 +++-
 fs/btrfs/raid56.h          |   17 +-
 fs/btrfs/scrub.c           | 1198 ++++++++++++++++++++++++++++++------
 fs/btrfs/scrub.h           |    2 +
 fs/btrfs/volumes.c         |   50 +-
 fs/btrfs/volumes.h         |   16 +
 include/uapi/linux/btrfs.h |   11 +-
 10 files changed, 1206 insertions(+), 241 deletions(-)

Comments

Graham Cobb July 3, 2023, 12:58 p.m. UTC | #1

On 03/07/2023 08:32, Qu Wenruo wrote:
> [CHANGELOG]
> RFC->v1:
> - Add RAID56 support
>   Which is the biggest advantage of the new scrub mode.
> 
> - More basic tests around various repair situations
> 
> [REPO]
> Please fetch from github repo:
> https://github.com/adam900710/linux/tree/scrub_logical
> 
> This is based on David's misc-next, but still has one extra regression
> fix which is not yet merged. ("btrfs: raid56: always verify the P/Q
> contents for scrub")
> 
> [BACKGROUND]
> Scrub originally works in a per-device basis, that means if we want to
> scrub the full fs, we would queue a scrub for each writeable device.
> 
> This is normally fine, but some behavior is not ideal like the
> following:
> 		X	X+16K	X+32K
>  Mirror 1	|XXXXXXX|       |
>  Mirror 2	|       |XXXXXXX|
> 
> When scrubbing mirror 1, we found [X, X+16K) has corruption, then we
> will try to read from mirror 2 and repair using the correct data.
> 
> This applies to range [X+16K, X+32K) of mirror 2, causing the good copy
> to be read twice, which may or may not be a big deal.
> 
> But the problem can easily go crazy for RAID56, as its repair requires
> the full stripe to be read out, so is its P/Q verification, e.g.:
> 
> 		X	X+16K	X+32K
>  Data stripe 1	|XXXXXXX|       |	|	|
>  Data stripe 2	|       |XXXXXXX|	|	|
>  Parity stripe	|       |	|XXXXXXX|	|
> 
> In above case, if we're scrubbing all mirrors, we would read the same
> contents again and again:
> 
> Scrub on mirror 1:
> - Read data stripe 1 for the initial read.
> - Read data stripes 1 + 2 + parity for the rebuild.
> 
> Scrub on mirror 2:
> - Read data stripe 2 for the initial read.
> - Read data stripes 1 + 2 + parity for the rebuild.
> 
> Scrub on Parity
> - Read data stripes 1 + 2 for the data stripes verification
> - Read data stripes 1 + 2 + parity for the data rebuild
>   This step is already improved by recent optimization to use cached
>   stripes.
> - Read the parity stripe for the final verification
> 
> So for data stripes, they are read *five* times, and *three* times for
> parity stripes.
> 
> [ENHANCEMENT]
> Instead of the existing per-device scrub, this patchset introduce a new
> flag, SCRUB_LOGICAL, to do logical address space based scrub.
> 
> Unlike per-device scrub, this new flag would make scrub a per-fs
> operation.
> 
> This allows us to scrub the different mirrors in one go, and avoid
> unnecessary read to repair the corruption.
> 
> This means, no matter what profile, they always read the same data just
> once.
> 
> This also makes user space handling much simpler, just one ioctl to
> scrub the full fs, without the current multi-thread scrub code.

I have a comment on terminology. If I understand correctly, this flag
changes scrub from an operation performed in parallel on all devices, to
one done sequentially, with less parallelism.

The original code scrubs every device at the same time. In very rough
terms, for a filesystem with more devices than copies, the duration for
a scrub with no errors is the time taken to read every block on the most
occupied device. Other disks will finish earlier.

In the same case, the new code will take the time taken to read one
block from every file (not just those on the most occupied disk). It is
not clear to me how much parallelism will occur in this case.

I am not saying it isn't worth doing, but that it may be best to explain
it in terms of parallel vs. sequential rather than physical vs. logical.
Certainly in making the user documentation, and scrub command line,
clear to the user and possibly even in the naming of the flag (e.g.
SCRUB_SEQUENTIAL instead of SCRUB_LOGICAL).

> 
> [TODO]
> - More testing
>   Currently only done functional tests, no stress tests yet.
> 
> - Better progs integration
>   In theory we can silently try SCRUB_LOGICAL first, if the kernel
>   doesn't support it, then fallback to the old multi-device scrub.

Maybe not if my understanding is correct. On filesystems with more disks
than copies the new code may take noticeably longer?

Or do I misunderstand?

Graham

> 
>   But for current testing, a dedicated option is still assigned for
>   scrub subcommand.
> 
>   And currently it only supports full report, no summary nor scrub file
>   support.
> 
> Qu Wenruo (14):
>   btrfs: raid56: remove unnecessary parameter for
>     raid56_parity_alloc_scrub_rbio()
>   btrfs: raid56: allow scrub operation to update both P and Q stripes
>   btrfs: raid56: allow caching P/Q stripes
>   btrfs: raid56: add the interfaces to submit recovery rbio
>   btrfs: add the ability to read P/Q stripes directly
>   btrfs: scrub: make scrub_ctx::stripes dynamically allocated
>   btrfs: scrub: introduce the skeleton for logical-scrub
>   btrfs: scrub: extract the common preparation before scrubbing a block
>     group
>   btrfs: scrub: implement the chunk iteration code for scrub_logical
>   btrfs: scrub: implement the basic extent iteration code
>   btrfs: scrub: implement the repair part of scrub logical
>   btrfs: scrub: add RAID56 support for queue_scrub_logical_stripes()
>   btrfs: scrub: introduce the RAID56 data recovery path for scrub
>     logical
>   btrfs: scrub: implement the RAID56 P/Q scrub code
> 
>  fs/btrfs/disk-io.c         |    1 +
>  fs/btrfs/fs.h              |   12 +
>  fs/btrfs/ioctl.c           |    6 +-
>  fs/btrfs/raid56.c          |  134 +++-
>  fs/btrfs/raid56.h          |   17 +-
>  fs/btrfs/scrub.c           | 1198 ++++++++++++++++++++++++++++++------
>  fs/btrfs/scrub.h           |    2 +
>  fs/btrfs/volumes.c         |   50 +-
>  fs/btrfs/volumes.h         |   16 +
>  include/uapi/linux/btrfs.h |   11 +-
>  10 files changed, 1206 insertions(+), 241 deletions(-)
>

Qu Wenruo July 3, 2023, 10:40 p.m. UTC | #2

On 2023/7/3 20:58, Graham Cobb wrote:
> On 03/07/2023 08:32, Qu Wenruo wrote:
>> [CHANGELOG]
>> RFC->v1:
>> - Add RAID56 support
>>    Which is the biggest advantage of the new scrub mode.
>>
>> - More basic tests around various repair situations
>>
>> [REPO]
>> Please fetch from github repo:
>> https://github.com/adam900710/linux/tree/scrub_logical
>>
>> This is based on David's misc-next, but still has one extra regression
>> fix which is not yet merged. ("btrfs: raid56: always verify the P/Q
>> contents for scrub")
>>
>> [BACKGROUND]
>> Scrub originally works in a per-device basis, that means if we want to
>> scrub the full fs, we would queue a scrub for each writeable device.
>>
>> This is normally fine, but some behavior is not ideal like the
>> following:
>> 		X	X+16K	X+32K
>>   Mirror 1	|XXXXXXX|       |
>>   Mirror 2	|       |XXXXXXX|
>>
>> When scrubbing mirror 1, we found [X, X+16K) has corruption, then we
>> will try to read from mirror 2 and repair using the correct data.
>>
>> This applies to range [X+16K, X+32K) of mirror 2, causing the good copy
>> to be read twice, which may or may not be a big deal.
>>
>> But the problem can easily go crazy for RAID56, as its repair requires
>> the full stripe to be read out, so is its P/Q verification, e.g.:
>>
>> 		X	X+16K	X+32K
>>   Data stripe 1	|XXXXXXX|       |	|	|
>>   Data stripe 2	|       |XXXXXXX|	|	|
>>   Parity stripe	|       |	|XXXXXXX|	|
>>
>> In above case, if we're scrubbing all mirrors, we would read the same
>> contents again and again:
>>
>> Scrub on mirror 1:
>> - Read data stripe 1 for the initial read.
>> - Read data stripes 1 + 2 + parity for the rebuild.
>>
>> Scrub on mirror 2:
>> - Read data stripe 2 for the initial read.
>> - Read data stripes 1 + 2 + parity for the rebuild.
>>
>> Scrub on Parity
>> - Read data stripes 1 + 2 for the data stripes verification
>> - Read data stripes 1 + 2 + parity for the data rebuild
>>    This step is already improved by recent optimization to use cached
>>    stripes.
>> - Read the parity stripe for the final verification
>>
>> So for data stripes, they are read *five* times, and *three* times for
>> parity stripes.
>>
>> [ENHANCEMENT]
>> Instead of the existing per-device scrub, this patchset introduce a new
>> flag, SCRUB_LOGICAL, to do logical address space based scrub.
>>
>> Unlike per-device scrub, this new flag would make scrub a per-fs
>> operation.
>>
>> This allows us to scrub the different mirrors in one go, and avoid
>> unnecessary read to repair the corruption.
>>
>> This means, no matter what profile, they always read the same data just
>> once.
>>
>> This also makes user space handling much simpler, just one ioctl to
>> scrub the full fs, without the current multi-thread scrub code.
>
> I have a comment on terminology. If I understand correctly, this flag
> changes scrub from an operation performed in parallel on all devices, to
> one done sequentially, with less parallelism.

It depends.

There are several different factors affecting the concurrency even for
the mentioned cases.

Like the race of different devices for the dev-extent and csum tree lookup.

In fact for nr_devices > mirrors (aka, RAID1*, DUP, SINGLE) profiles,
the difference scrub pace can lead to more problems like too many block
group mark RO.

For this situation it's really hard to say the impact, but I'm
optimistic on the performance, and it would be finally determine by real
world benchmark.

Please remember that, the worst case profiles are mostly only utilized
by metadata, while for data it's way more common to go RAID0/RAID10 or
even RAID5/6 for adventurous users, which the new interface is way
better for those profiles.

>
> The original code scrubs every device at the same time. In very rough
> terms, for a filesystem with more devices than copies, the duration for
> a scrub with no errors is the time taken to read every block on the most
> occupied device. Other disks will finish earlier.
>
> In the same case, the new code will take the time taken to read one
> block from every file (not just those on the most occupied disk). It is
> not clear to me how much parallelism will occur in this case.
>
> I am not saying it isn't worth doing, but that it may be best to explain
> it in terms of parallel vs. sequential rather than physical vs. logical.
> Certainly in making the user documentation, and scrub command line,
> clear to the user and possibly even in the naming of the flag (e.g.
> SCRUB_SEQUENTIAL instead of SCRUB_LOGICAL).
>
>>
>> [TODO]
>> - More testing
>>    Currently only done functional tests, no stress tests yet.
>>
>> - Better progs integration
>>    In theory we can silently try SCRUB_LOGICAL first, if the kernel
>>    doesn't support it, then fallback to the old multi-device scrub.
>
> Maybe not if my understanding is correct. On filesystems with more disks
> than copies the new code may take noticeably longer?

Not really.

I'd like to have some real world benchmark for these cases instead.

Thanks,
Qu

>
> Or do I misunderstand?
>
> Graham
>
>>
>>    But for current testing, a dedicated option is still assigned for
>>    scrub subcommand.
>>
>>    And currently it only supports full report, no summary nor scrub file
>>    support.
>>
>> Qu Wenruo (14):
>>    btrfs: raid56: remove unnecessary parameter for
>>      raid56_parity_alloc_scrub_rbio()
>>    btrfs: raid56: allow scrub operation to update both P and Q stripes
>>    btrfs: raid56: allow caching P/Q stripes
>>    btrfs: raid56: add the interfaces to submit recovery rbio
>>    btrfs: add the ability to read P/Q stripes directly
>>    btrfs: scrub: make scrub_ctx::stripes dynamically allocated
>>    btrfs: scrub: introduce the skeleton for logical-scrub
>>    btrfs: scrub: extract the common preparation before scrubbing a block
>>      group
>>    btrfs: scrub: implement the chunk iteration code for scrub_logical
>>    btrfs: scrub: implement the basic extent iteration code
>>    btrfs: scrub: implement the repair part of scrub logical
>>    btrfs: scrub: add RAID56 support for queue_scrub_logical_stripes()
>>    btrfs: scrub: introduce the RAID56 data recovery path for scrub
>>      logical
>>    btrfs: scrub: implement the RAID56 P/Q scrub code
>>
>>   fs/btrfs/disk-io.c         |    1 +
>>   fs/btrfs/fs.h              |   12 +
>>   fs/btrfs/ioctl.c           |    6 +-
>>   fs/btrfs/raid56.c          |  134 +++-
>>   fs/btrfs/raid56.h          |   17 +-
>>   fs/btrfs/scrub.c           | 1198 ++++++++++++++++++++++++++++++------
>>   fs/btrfs/scrub.h           |    2 +
>>   fs/btrfs/volumes.c         |   50 +-
>>   fs/btrfs/volumes.h         |   16 +
>>   include/uapi/linux/btrfs.h |   11 +-
>>   10 files changed, 1206 insertions(+), 241 deletions(-)
>>
>

Graham Cobb July 3, 2023, 11:19 p.m. UTC | #3

On 03/07/2023 23:40, Qu Wenruo wrote:
> 
> 
> On 2023/7/3 20:58, Graham Cobb wrote:
>> On 03/07/2023 08:32, Qu Wenruo wrote:
>>> [CHANGELOG]
>>> RFC->v1:
>>> - Add RAID56 support
>>>    Which is the biggest advantage of the new scrub mode.
>>>
>>> - More basic tests around various repair situations
>>>
>>> [REPO]
>>> Please fetch from github repo:
>>> https://github.com/adam900710/linux/tree/scrub_logical
>>>
>>> This is based on David's misc-next, but still has one extra regression
>>> fix which is not yet merged. ("btrfs: raid56: always verify the P/Q
>>> contents for scrub")
>>>
>>> [BACKGROUND]
>>> Scrub originally works in a per-device basis, that means if we want to
>>> scrub the full fs, we would queue a scrub for each writeable device.
>>>
>>> This is normally fine, but some behavior is not ideal like the
>>> following:
>>>         X    X+16K    X+32K
>>>   Mirror 1    |XXXXXXX|       |
>>>   Mirror 2    |       |XXXXXXX|
>>>
>>> When scrubbing mirror 1, we found [X, X+16K) has corruption, then we
>>> will try to read from mirror 2 and repair using the correct data.
>>>
>>> This applies to range [X+16K, X+32K) of mirror 2, causing the good copy
>>> to be read twice, which may or may not be a big deal.
>>>
>>> But the problem can easily go crazy for RAID56, as its repair requires
>>> the full stripe to be read out, so is its P/Q verification, e.g.:
>>>
>>>         X    X+16K    X+32K
>>>   Data stripe 1    |XXXXXXX|       |    |    |
>>>   Data stripe 2    |       |XXXXXXX|    |    |
>>>   Parity stripe    |       |    |XXXXXXX|    |
>>>
>>> In above case, if we're scrubbing all mirrors, we would read the same
>>> contents again and again:
>>>
>>> Scrub on mirror 1:
>>> - Read data stripe 1 for the initial read.
>>> - Read data stripes 1 + 2 + parity for the rebuild.
>>>
>>> Scrub on mirror 2:
>>> - Read data stripe 2 for the initial read.
>>> - Read data stripes 1 + 2 + parity for the rebuild.
>>>
>>> Scrub on Parity
>>> - Read data stripes 1 + 2 for the data stripes verification
>>> - Read data stripes 1 + 2 + parity for the data rebuild
>>>    This step is already improved by recent optimization to use cached
>>>    stripes.
>>> - Read the parity stripe for the final verification
>>>
>>> So for data stripes, they are read *five* times, and *three* times for
>>> parity stripes.
>>>
>>> [ENHANCEMENT]
>>> Instead of the existing per-device scrub, this patchset introduce a new
>>> flag, SCRUB_LOGICAL, to do logical address space based scrub.
>>>
>>> Unlike per-device scrub, this new flag would make scrub a per-fs
>>> operation.
>>>
>>> This allows us to scrub the different mirrors in one go, and avoid
>>> unnecessary read to repair the corruption.
>>>
>>> This means, no matter what profile, they always read the same data just
>>> once.
>>>
>>> This also makes user space handling much simpler, just one ioctl to
>>> scrub the full fs, without the current multi-thread scrub code.
>>
>> I have a comment on terminology. If I understand correctly, this flag
>> changes scrub from an operation performed in parallel on all devices, to
>> one done sequentially, with less parallelism.
> 
> It depends.
> 
> There are several different factors affecting the concurrency even for
> the mentioned cases.
> 
> Like the race of different devices for the dev-extent and csum tree lookup.
> 
> In fact for nr_devices > mirrors (aka, RAID1*, DUP, SINGLE) profiles,
> the difference scrub pace can lead to more problems like too many block
> group mark RO.
> 
> For this situation it's really hard to say the impact, but I'm
> optimistic on the performance, and it would be finally determine by real
> world benchmark.
> 
> Please remember that, the worst case profiles are mostly only utilized
> by metadata, while for data it's way more common to go RAID0/RAID10 or
> even RAID5/6 for adventurous users, which the new interface is way
> better for those profiles.

It would be great to see some real measurements, of course, but I am
thinking of cases like RAID1 data on 3, 4 or even more disks. It feels
like a RAID1 filesystem with 4 disks might take up to twice as long with
logical address-based scrub instead of per-device scrub. And I can't
even begin to visualise how it would perform with RAID5 data over 10
disks, say.

But getting benchmarks and practical experience is obviously a good
idea. Just please don't take control of which algorithm to use away from
the system manager without being sure it is really better.

Graham

> 
>>
>> The original code scrubs every device at the same time. In very rough
>> terms, for a filesystem with more devices than copies, the duration for
>> a scrub with no errors is the time taken to read every block on the most
>> occupied device. Other disks will finish earlier.
>>
>> In the same case, the new code will take the time taken to read one
>> block from every file (not just those on the most occupied disk). It is
>> not clear to me how much parallelism will occur in this case.
>>
>> I am not saying it isn't worth doing, but that it may be best to explain
>> it in terms of parallel vs. sequential rather than physical vs. logical.
>> Certainly in making the user documentation, and scrub command line,
>> clear to the user and possibly even in the naming of the flag (e.g.
>> SCRUB_SEQUENTIAL instead of SCRUB_LOGICAL).
>>
>>>
>>> [TODO]
>>> - More testing
>>>    Currently only done functional tests, no stress tests yet.
>>>
>>> - Better progs integration
>>>    In theory we can silently try SCRUB_LOGICAL first, if the kernel
>>>    doesn't support it, then fallback to the old multi-device scrub.
>>
>> Maybe not if my understanding is correct. On filesystems with more disks
>> than copies the new code may take noticeably longer?
> 
> Not really.
> 
> I'd like to have some real world benchmark for these cases instead.
> 
> Thanks,
> Qu
> 
>>
>> Or do I misunderstand?
>>
>> Graham
>>
>>>
>>>    But for current testing, a dedicated option is still assigned for
>>>    scrub subcommand.
>>>
>>>    And currently it only supports full report, no summary nor scrub file
>>>    support.
>>>
>>> Qu Wenruo (14):
>>>    btrfs: raid56: remove unnecessary parameter for
>>>      raid56_parity_alloc_scrub_rbio()
>>>    btrfs: raid56: allow scrub operation to update both P and Q stripes
>>>    btrfs: raid56: allow caching P/Q stripes
>>>    btrfs: raid56: add the interfaces to submit recovery rbio
>>>    btrfs: add the ability to read P/Q stripes directly
>>>    btrfs: scrub: make scrub_ctx::stripes dynamically allocated
>>>    btrfs: scrub: introduce the skeleton for logical-scrub
>>>    btrfs: scrub: extract the common preparation before scrubbing a block
>>>      group
>>>    btrfs: scrub: implement the chunk iteration code for scrub_logical
>>>    btrfs: scrub: implement the basic extent iteration code
>>>    btrfs: scrub: implement the repair part of scrub logical
>>>    btrfs: scrub: add RAID56 support for queue_scrub_logical_stripes()
>>>    btrfs: scrub: introduce the RAID56 data recovery path for scrub
>>>      logical
>>>    btrfs: scrub: implement the RAID56 P/Q scrub code
>>>
>>>   fs/btrfs/disk-io.c         |    1 +
>>>   fs/btrfs/fs.h              |   12 +
>>>   fs/btrfs/ioctl.c           |    6 +-
>>>   fs/btrfs/raid56.c          |  134 +++-
>>>   fs/btrfs/raid56.h          |   17 +-
>>>   fs/btrfs/scrub.c           | 1198 ++++++++++++++++++++++++++++++------
>>>   fs/btrfs/scrub.h           |    2 +
>>>   fs/btrfs/volumes.c         |   50 +-
>>>   fs/btrfs/volumes.h         |   16 +
>>>   include/uapi/linux/btrfs.h |   11 +-
>>>   10 files changed, 1206 insertions(+), 241 deletions(-)
>>>
>>

David Sterba July 13, 2023, 12:14 p.m. UTC | #4

On Mon, Jul 03, 2023 at 03:32:24PM +0800, Qu Wenruo wrote:
> [CHANGELOG]
> RFC->v1:
> - Add RAID56 support
>   Which is the biggest advantage of the new scrub mode.

Great, thanks. The new functionality will have to wait until the scrub
performance regression is fixed though.