Message ID | 20230724023606.91107-1-wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] btrfs: add a test case to make sure scrub can repair parity corruption | expand |
On 24/7/23 10:36, Qu Wenruo wrote: > There is a kernel regression caused by commit 75b470332965 ("btrfs: > raid56: migrate recovery and scrub recovery path to use error_bitmap"), > which leads to scrub not repairing corrupted parity stripes. > > So here we add a test case to verify the P/Q stripe scrub behavior by: > > - Create a RAID5 or RAID6 btrfs with minimal amount of devices > This means 2 devices for RAID5, and 3 devices for RAID6. > This would result the parity stripe to be a mirror of the only data > stripe. > > And since we have control of the content of data stripes, the content > of the P stripe is also fixed. > > - Create an 64K file > The file would cover one data stripe. > > - Corrupt the P stripe > > - Scrub the fs > If scrub is working, the P stripe would be repaired. > > Unfortunately scrub can not report any P/Q corruption, limited by its > reporting structure. > So we can not use the return value of scrub to determine if we > repaired anything. > > - Verify the content of the P stripe > > - Use "btrfs check --check-data-csum" to double check > > By above steps, we can verify if the P stripe is properly fixed. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > Changelog: > v2: > - Rebase to the latest misc-next > - Use space_cache=v2 mount option instead of nospace_cache > New features like block group tree and extent tree v2 requires v2 > cache > - Fix a white space error > --- > tests/btrfs/297 | 85 +++++++++++++++++++++++++++++++++++++++++++++ > tests/btrfs/297.out | 2 ++ > 2 files changed, 87 insertions(+) > create mode 100755 tests/btrfs/297 > create mode 100644 tests/btrfs/297.out > > diff --git a/tests/btrfs/297 b/tests/btrfs/297 > new file mode 100755 > index 00000000..852c3ace > --- /dev/null > +++ b/tests/btrfs/297 > @@ -0,0 +1,85 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (c) 2023 YOUR NAME HERE. All Rights Reserved. NIT: Actual name is required here. Rest of the code looks good. Thanks. > +# > +# FS QA Test 297 > +# > +# Make sure btrfs scrub can fix parity stripe corruption > +# > +. ./common/preamble > +_begin_fstest auto quick raid scrub > + > +. ./common/filter > + > +_supported_fs btrfs > +_require_odirect > +_require_non_zoned_device "${SCRATCH_DEV}" > +_require_scratch_dev_pool 3 > +_fixed_by_kernel_commit 486c737f7fdc \ > + "btrfs: raid56: always verify the P/Q contents for scrub" > + > +workload() > +{ > + local profile=$1 > + local nr_devs=$2 > + > + echo "=== Testing $nr_devs devices $profile ===" >> $seqres.full > + _scratch_dev_pool_get $nr_devs > + > + _scratch_pool_mkfs -d $profile -m single >> $seqres.full 2>&1 > + # Use v2 space cache to prevent v1 space cache affecting > + # the result. > + _scratch_mount -o space_cache=v2 > + > + # Create one 64K extent which would cover one data stripe. > + $XFS_IO_PROG -f -d -c "pwrite -S 0xaa -b 64K 0 64K" \ > + "$SCRATCH_MNT/foobar" > /dev/null > + sync > + > + # Corrupt the P/Q stripe > + local logical=$(_btrfs_get_first_logical $SCRATCH_MNT/foobar) > + > + # The 2nd copy is pointed to P stripe directly. > + physical_p=$(_btrfs_get_physical ${logical} 2) > + devpath_p=$(_btrfs_get_device_path ${logical} 2) > + > + _scratch_unmount > + > + echo "Corrupt stripe P at devpath $devpath_p physical $physical_p" \ > + >> $seqres.full > + $XFS_IO_PROG -d -c "pwrite -S 0xff -b 64K $physical_p 64K" $devpath_p \ > + > /dev/null > + > + # Do a scrub to try repair the P stripe. > + _scratch_mount -o space_cache=v2 > + $BTRFS_UTIL_PROG scrub start -BdR $SCRATCH_MNT >> $seqres.full 2>&1 > + _scratch_unmount > + > + # Verify the repaired content directly > + local output=$($XFS_IO_PROG -c "pread -qv $physical_p 16" $devpath_p | _filter_xfs_io_offset) > + local expect="XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................" > + > + echo "The first 16 bytes of parity stripe after scrub:" >> $seqres.full > + echo $output >> $seqres.full > + if [ "$output" != "$expect" ]; then > + echo "Unexpected parity content" > + echo "has:" > + echo "$output" > + echo "expect" > + echo "$expect" > + fi > + > + # Last safenet, let btrfs check --check-data-csum to do an offline scrub. > + $BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1 > + if [ $? -ne 0 ]; then > + echo "Error detected after the scrub" > + fi > + _scratch_dev_pool_put > +} > + > +workload raid5 2 > +workload raid6 3 > + > +echo "Silence is golden" > +status=0 > +exit > diff --git a/tests/btrfs/297.out b/tests/btrfs/297.out > new file mode 100644 > index 00000000..41c373c4 > --- /dev/null > +++ b/tests/btrfs/297.out > @@ -0,0 +1,2 @@ > +QA output created by 297 > +Silence is golden
On Mon, Jul 24, 2023 at 03:48:52PM +0800, Anand Jain wrote: > On 24/7/23 10:36, Qu Wenruo wrote: > > There is a kernel regression caused by commit 75b470332965 ("btrfs: > > raid56: migrate recovery and scrub recovery path to use error_bitmap"), > > which leads to scrub not repairing corrupted parity stripes. > > > > So here we add a test case to verify the P/Q stripe scrub behavior by: > > > > - Create a RAID5 or RAID6 btrfs with minimal amount of devices > > This means 2 devices for RAID5, and 3 devices for RAID6. > > This would result the parity stripe to be a mirror of the only data > > stripe. > > > > And since we have control of the content of data stripes, the content > > of the P stripe is also fixed. > > > > - Create an 64K file > > The file would cover one data stripe. > > > > - Corrupt the P stripe > > > > - Scrub the fs > > If scrub is working, the P stripe would be repaired. > > > > Unfortunately scrub can not report any P/Q corruption, limited by its > > reporting structure. > > So we can not use the return value of scrub to determine if we > > repaired anything. > > > > - Verify the content of the P stripe > > > > - Use "btrfs check --check-data-csum" to double check > > > > By above steps, we can verify if the P stripe is properly fixed. > > > > Signed-off-by: Qu Wenruo <wqu@suse.com> > > --- > > Changelog: > > v2: > > - Rebase to the latest misc-next > > - Use space_cache=v2 mount option instead of nospace_cache > > New features like block group tree and extent tree v2 requires v2 > > cache > > - Fix a white space error > > --- > > tests/btrfs/297 | 85 +++++++++++++++++++++++++++++++++++++++++++++ > > tests/btrfs/297.out | 2 ++ > > 2 files changed, 87 insertions(+) > > create mode 100755 tests/btrfs/297 > > create mode 100644 tests/btrfs/297.out > > > > diff --git a/tests/btrfs/297 b/tests/btrfs/297 > > new file mode 100755 > > index 00000000..852c3ace > > --- /dev/null > > +++ b/tests/btrfs/297 > > @@ -0,0 +1,85 @@ > > +#! /bin/bash > > +# SPDX-License-Identifier: GPL-2.0 > > > +# Copyright (c) 2023 YOUR NAME HERE. All Rights Reserved. > > NIT: Actual name is required here. > > Rest of the code looks good. If there's not more review points, I can help to s/YOUR NAME HERE/SUSE Linux Products GmbH when I merge it. BTW, do you mean there's a RVB from you? Thanks, Zorro > > Thanks. > > > +# > > +# FS QA Test 297 > > +# > > +# Make sure btrfs scrub can fix parity stripe corruption > > +# > > +. ./common/preamble > > +_begin_fstest auto quick raid scrub > > + > > +. ./common/filter > > + > > +_supported_fs btrfs > > +_require_odirect > > +_require_non_zoned_device "${SCRATCH_DEV}" > > +_require_scratch_dev_pool 3 > > +_fixed_by_kernel_commit 486c737f7fdc \ > > + "btrfs: raid56: always verify the P/Q contents for scrub" > > + > > +workload() > > +{ > > + local profile=$1 > > + local nr_devs=$2 > > + > > + echo "=== Testing $nr_devs devices $profile ===" >> $seqres.full > > + _scratch_dev_pool_get $nr_devs > > + > > + _scratch_pool_mkfs -d $profile -m single >> $seqres.full 2>&1 > > + # Use v2 space cache to prevent v1 space cache affecting > > + # the result. > > + _scratch_mount -o space_cache=v2 > > + > > + # Create one 64K extent which would cover one data stripe. > > + $XFS_IO_PROG -f -d -c "pwrite -S 0xaa -b 64K 0 64K" \ > > + "$SCRATCH_MNT/foobar" > /dev/null > > + sync > > + > > + # Corrupt the P/Q stripe > > + local logical=$(_btrfs_get_first_logical $SCRATCH_MNT/foobar) > > + > > + # The 2nd copy is pointed to P stripe directly. > > + physical_p=$(_btrfs_get_physical ${logical} 2) > > + devpath_p=$(_btrfs_get_device_path ${logical} 2) > > + > > + _scratch_unmount > > + > > + echo "Corrupt stripe P at devpath $devpath_p physical $physical_p" \ > > + >> $seqres.full > > + $XFS_IO_PROG -d -c "pwrite -S 0xff -b 64K $physical_p 64K" $devpath_p \ > > + > /dev/null > > + > > + # Do a scrub to try repair the P stripe. > > + _scratch_mount -o space_cache=v2 > > + $BTRFS_UTIL_PROG scrub start -BdR $SCRATCH_MNT >> $seqres.full 2>&1 > > + _scratch_unmount > > + > > + # Verify the repaired content directly > > + local output=$($XFS_IO_PROG -c "pread -qv $physical_p 16" $devpath_p | _filter_xfs_io_offset) > > + local expect="XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................" > > + > > + echo "The first 16 bytes of parity stripe after scrub:" >> $seqres.full > > + echo $output >> $seqres.full > > + if [ "$output" != "$expect" ]; then > > + echo "Unexpected parity content" > > + echo "has:" > > + echo "$output" > > + echo "expect" > > + echo "$expect" > > + fi > > + > > + # Last safenet, let btrfs check --check-data-csum to do an offline scrub. > > + $BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1 > > + if [ $? -ne 0 ]; then > > + echo "Error detected after the scrub" > > + fi > > + _scratch_dev_pool_put > > +} > > + > > +workload raid5 2 > > +workload raid6 3 > > + > > +echo "Silence is golden" > > +status=0 > > +exit > > diff --git a/tests/btrfs/297.out b/tests/btrfs/297.out > > new file mode 100644 > > index 00000000..41c373c4 > > --- /dev/null > > +++ b/tests/btrfs/297.out > > @@ -0,0 +1,2 @@ > > +QA output created by 297 > > +Silence is golden >
On 25/07/2023 18:36, Zorro Lang wrote: > On Mon, Jul 24, 2023 at 03:48:52PM +0800, Anand Jain wrote: >> On 24/7/23 10:36, Qu Wenruo wrote: >>> There is a kernel regression caused by commit 75b470332965 ("btrfs: >>> raid56: migrate recovery and scrub recovery path to use error_bitmap"), >>> which leads to scrub not repairing corrupted parity stripes. >>> >>> So here we add a test case to verify the P/Q stripe scrub behavior by: >>> >>> - Create a RAID5 or RAID6 btrfs with minimal amount of devices >>> This means 2 devices for RAID5, and 3 devices for RAID6. >>> This would result the parity stripe to be a mirror of the only data >>> stripe. >>> >>> And since we have control of the content of data stripes, the content >>> of the P stripe is also fixed. >>> >>> - Create an 64K file >>> The file would cover one data stripe. >>> >>> - Corrupt the P stripe >>> >>> - Scrub the fs >>> If scrub is working, the P stripe would be repaired. >>> >>> Unfortunately scrub can not report any P/Q corruption, limited by its >>> reporting structure. >>> So we can not use the return value of scrub to determine if we >>> repaired anything. >>> >>> - Verify the content of the P stripe >>> >>> - Use "btrfs check --check-data-csum" to double check >>> >>> By above steps, we can verify if the P stripe is properly fixed. >>> >>> Signed-off-by: Qu Wenruo <wqu@suse.com> >>> --- >>> Changelog: >>> v2: >>> - Rebase to the latest misc-next >>> - Use space_cache=v2 mount option instead of nospace_cache >>> New features like block group tree and extent tree v2 requires v2 >>> cache >>> - Fix a white space error >>> --- >>> tests/btrfs/297 | 85 +++++++++++++++++++++++++++++++++++++++++++++ >>> tests/btrfs/297.out | 2 ++ >>> 2 files changed, 87 insertions(+) >>> create mode 100755 tests/btrfs/297 >>> create mode 100644 tests/btrfs/297.out >>> >>> diff --git a/tests/btrfs/297 b/tests/btrfs/297 >>> new file mode 100755 >>> index 00000000..852c3ace >>> --- /dev/null >>> +++ b/tests/btrfs/297 >>> @@ -0,0 +1,85 @@ >>> +#! /bin/bash >>> +# SPDX-License-Identifier: GPL-2.0 >> >>> +# Copyright (c) 2023 YOUR NAME HERE. All Rights Reserved. >> >> NIT: Actual name is required here. >> >> Rest of the code looks good. > > If there's not more review points, I can help to > s/YOUR NAME HERE/SUSE Linux Products GmbH > when I merge it. > > BTW, do you mean there's a RVB from you? Qu has sent v3 for this patch and there is my RVB as well. Thanks. > > Thanks, > Zorro > >> >> Thanks. >> >>> +# >>> +# FS QA Test 297 >>> +# >>> +# Make sure btrfs scrub can fix parity stripe corruption >>> +# >>> +. ./common/preamble >>> +_begin_fstest auto quick raid scrub >>> + >>> +. ./common/filter >>> + >>> +_supported_fs btrfs >>> +_require_odirect >>> +_require_non_zoned_device "${SCRATCH_DEV}" >>> +_require_scratch_dev_pool 3 >>> +_fixed_by_kernel_commit 486c737f7fdc \ >>> + "btrfs: raid56: always verify the P/Q contents for scrub" >>> + >>> +workload() >>> +{ >>> + local profile=$1 >>> + local nr_devs=$2 >>> + >>> + echo "=== Testing $nr_devs devices $profile ===" >> $seqres.full >>> + _scratch_dev_pool_get $nr_devs >>> + >>> + _scratch_pool_mkfs -d $profile -m single >> $seqres.full 2>&1 >>> + # Use v2 space cache to prevent v1 space cache affecting >>> + # the result. >>> + _scratch_mount -o space_cache=v2 >>> + >>> + # Create one 64K extent which would cover one data stripe. >>> + $XFS_IO_PROG -f -d -c "pwrite -S 0xaa -b 64K 0 64K" \ >>> + "$SCRATCH_MNT/foobar" > /dev/null >>> + sync >>> + >>> + # Corrupt the P/Q stripe >>> + local logical=$(_btrfs_get_first_logical $SCRATCH_MNT/foobar) >>> + >>> + # The 2nd copy is pointed to P stripe directly. >>> + physical_p=$(_btrfs_get_physical ${logical} 2) >>> + devpath_p=$(_btrfs_get_device_path ${logical} 2) >>> + >>> + _scratch_unmount >>> + >>> + echo "Corrupt stripe P at devpath $devpath_p physical $physical_p" \ >>> + >> $seqres.full >>> + $XFS_IO_PROG -d -c "pwrite -S 0xff -b 64K $physical_p 64K" $devpath_p \ >>> + > /dev/null >>> + >>> + # Do a scrub to try repair the P stripe. >>> + _scratch_mount -o space_cache=v2 >>> + $BTRFS_UTIL_PROG scrub start -BdR $SCRATCH_MNT >> $seqres.full 2>&1 >>> + _scratch_unmount >>> + >>> + # Verify the repaired content directly >>> + local output=$($XFS_IO_PROG -c "pread -qv $physical_p 16" $devpath_p | _filter_xfs_io_offset) >>> + local expect="XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................" >>> + >>> + echo "The first 16 bytes of parity stripe after scrub:" >> $seqres.full >>> + echo $output >> $seqres.full >>> + if [ "$output" != "$expect" ]; then >>> + echo "Unexpected parity content" >>> + echo "has:" >>> + echo "$output" >>> + echo "expect" >>> + echo "$expect" >>> + fi >>> + >>> + # Last safenet, let btrfs check --check-data-csum to do an offline scrub. >>> + $BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1 >>> + if [ $? -ne 0 ]; then >>> + echo "Error detected after the scrub" >>> + fi >>> + _scratch_dev_pool_put >>> +} >>> + >>> +workload raid5 2 >>> +workload raid6 3 >>> + >>> +echo "Silence is golden" >>> +status=0 >>> +exit >>> diff --git a/tests/btrfs/297.out b/tests/btrfs/297.out >>> new file mode 100644 >>> index 00000000..41c373c4 >>> --- /dev/null >>> +++ b/tests/btrfs/297.out >>> @@ -0,0 +1,2 @@ >>> +QA output created by 297 >>> +Silence is golden >> >
On Tue, Jul 25, 2023 at 07:13:02PM +0800, Anand Jain wrote: > > > On 25/07/2023 18:36, Zorro Lang wrote: > > On Mon, Jul 24, 2023 at 03:48:52PM +0800, Anand Jain wrote: > > > On 24/7/23 10:36, Qu Wenruo wrote: > > > > There is a kernel regression caused by commit 75b470332965 ("btrfs: > > > > raid56: migrate recovery and scrub recovery path to use error_bitmap"), > > > > which leads to scrub not repairing corrupted parity stripes. > > > > > > > > So here we add a test case to verify the P/Q stripe scrub behavior by: > > > > > > > > - Create a RAID5 or RAID6 btrfs with minimal amount of devices > > > > This means 2 devices for RAID5, and 3 devices for RAID6. > > > > This would result the parity stripe to be a mirror of the only data > > > > stripe. > > > > > > > > And since we have control of the content of data stripes, the content > > > > of the P stripe is also fixed. > > > > > > > > - Create an 64K file > > > > The file would cover one data stripe. > > > > > > > > - Corrupt the P stripe > > > > > > > > - Scrub the fs > > > > If scrub is working, the P stripe would be repaired. > > > > > > > > Unfortunately scrub can not report any P/Q corruption, limited by its > > > > reporting structure. > > > > So we can not use the return value of scrub to determine if we > > > > repaired anything. > > > > > > > > - Verify the content of the P stripe > > > > > > > > - Use "btrfs check --check-data-csum" to double check > > > > > > > > By above steps, we can verify if the P stripe is properly fixed. > > > > > > > > Signed-off-by: Qu Wenruo <wqu@suse.com> > > > > --- > > > > Changelog: > > > > v2: > > > > - Rebase to the latest misc-next > > > > - Use space_cache=v2 mount option instead of nospace_cache > > > > New features like block group tree and extent tree v2 requires v2 > > > > cache > > > > - Fix a white space error > > > > --- > > > > tests/btrfs/297 | 85 +++++++++++++++++++++++++++++++++++++++++++++ > > > > tests/btrfs/297.out | 2 ++ > > > > 2 files changed, 87 insertions(+) > > > > create mode 100755 tests/btrfs/297 > > > > create mode 100644 tests/btrfs/297.out > > > > > > > > diff --git a/tests/btrfs/297 b/tests/btrfs/297 > > > > new file mode 100755 > > > > index 00000000..852c3ace > > > > --- /dev/null > > > > +++ b/tests/btrfs/297 > > > > @@ -0,0 +1,85 @@ > > > > +#! /bin/bash > > > > +# SPDX-License-Identifier: GPL-2.0 > > > > > > > +# Copyright (c) 2023 YOUR NAME HERE. All Rights Reserved. > > > > > > NIT: Actual name is required here. > > > > > > Rest of the code looks good. > > > > If there's not more review points, I can help to > > s/YOUR NAME HERE/SUSE Linux Products GmbH > > when I merge it. > > > > BTW, do you mean there's a RVB from you? > > Qu has sent v3 for this patch and there is my RVB as well. > Thanks. OK, got it, thanks:) > > > > > Thanks, > > Zorro > > > > > > > > Thanks. > > > > > > > +# > > > > +# FS QA Test 297 > > > > +# > > > > +# Make sure btrfs scrub can fix parity stripe corruption > > > > +# > > > > +. ./common/preamble > > > > +_begin_fstest auto quick raid scrub > > > > + > > > > +. ./common/filter > > > > + > > > > +_supported_fs btrfs > > > > +_require_odirect > > > > +_require_non_zoned_device "${SCRATCH_DEV}" > > > > +_require_scratch_dev_pool 3 > > > > +_fixed_by_kernel_commit 486c737f7fdc \ > > > > + "btrfs: raid56: always verify the P/Q contents for scrub" > > > > + > > > > +workload() > > > > +{ > > > > + local profile=$1 > > > > + local nr_devs=$2 > > > > + > > > > + echo "=== Testing $nr_devs devices $profile ===" >> $seqres.full > > > > + _scratch_dev_pool_get $nr_devs > > > > + > > > > + _scratch_pool_mkfs -d $profile -m single >> $seqres.full 2>&1 > > > > + # Use v2 space cache to prevent v1 space cache affecting > > > > + # the result. > > > > + _scratch_mount -o space_cache=v2 > > > > + > > > > + # Create one 64K extent which would cover one data stripe. > > > > + $XFS_IO_PROG -f -d -c "pwrite -S 0xaa -b 64K 0 64K" \ > > > > + "$SCRATCH_MNT/foobar" > /dev/null > > > > + sync > > > > + > > > > + # Corrupt the P/Q stripe > > > > + local logical=$(_btrfs_get_first_logical $SCRATCH_MNT/foobar) > > > > + > > > > + # The 2nd copy is pointed to P stripe directly. > > > > + physical_p=$(_btrfs_get_physical ${logical} 2) > > > > + devpath_p=$(_btrfs_get_device_path ${logical} 2) > > > > + > > > > + _scratch_unmount > > > > + > > > > + echo "Corrupt stripe P at devpath $devpath_p physical $physical_p" \ > > > > + >> $seqres.full > > > > + $XFS_IO_PROG -d -c "pwrite -S 0xff -b 64K $physical_p 64K" $devpath_p \ > > > > + > /dev/null > > > > + > > > > + # Do a scrub to try repair the P stripe. > > > > + _scratch_mount -o space_cache=v2 > > > > + $BTRFS_UTIL_PROG scrub start -BdR $SCRATCH_MNT >> $seqres.full 2>&1 > > > > + _scratch_unmount > > > > + > > > > + # Verify the repaired content directly > > > > + local output=$($XFS_IO_PROG -c "pread -qv $physical_p 16" $devpath_p | _filter_xfs_io_offset) > > > > + local expect="XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................" > > > > + > > > > + echo "The first 16 bytes of parity stripe after scrub:" >> $seqres.full > > > > + echo $output >> $seqres.full > > > > + if [ "$output" != "$expect" ]; then > > > > + echo "Unexpected parity content" > > > > + echo "has:" > > > > + echo "$output" > > > > + echo "expect" > > > > + echo "$expect" > > > > + fi > > > > + > > > > + # Last safenet, let btrfs check --check-data-csum to do an offline scrub. > > > > + $BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1 > > > > + if [ $? -ne 0 ]; then > > > > + echo "Error detected after the scrub" > > > > + fi > > > > + _scratch_dev_pool_put > > > > +} > > > > + > > > > +workload raid5 2 > > > > +workload raid6 3 > > > > + > > > > +echo "Silence is golden" > > > > +status=0 > > > > +exit > > > > diff --git a/tests/btrfs/297.out b/tests/btrfs/297.out > > > > new file mode 100644 > > > > index 00000000..41c373c4 > > > > --- /dev/null > > > > +++ b/tests/btrfs/297.out > > > > @@ -0,0 +1,2 @@ > > > > +QA output created by 297 > > > > +Silence is golden > > > > > >
diff --git a/tests/btrfs/297 b/tests/btrfs/297 new file mode 100755 index 00000000..852c3ace --- /dev/null +++ b/tests/btrfs/297 @@ -0,0 +1,85 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2023 YOUR NAME HERE. All Rights Reserved. +# +# FS QA Test 297 +# +# Make sure btrfs scrub can fix parity stripe corruption +# +. ./common/preamble +_begin_fstest auto quick raid scrub + +. ./common/filter + +_supported_fs btrfs +_require_odirect +_require_non_zoned_device "${SCRATCH_DEV}" +_require_scratch_dev_pool 3 +_fixed_by_kernel_commit 486c737f7fdc \ + "btrfs: raid56: always verify the P/Q contents for scrub" + +workload() +{ + local profile=$1 + local nr_devs=$2 + + echo "=== Testing $nr_devs devices $profile ===" >> $seqres.full + _scratch_dev_pool_get $nr_devs + + _scratch_pool_mkfs -d $profile -m single >> $seqres.full 2>&1 + # Use v2 space cache to prevent v1 space cache affecting + # the result. + _scratch_mount -o space_cache=v2 + + # Create one 64K extent which would cover one data stripe. + $XFS_IO_PROG -f -d -c "pwrite -S 0xaa -b 64K 0 64K" \ + "$SCRATCH_MNT/foobar" > /dev/null + sync + + # Corrupt the P/Q stripe + local logical=$(_btrfs_get_first_logical $SCRATCH_MNT/foobar) + + # The 2nd copy is pointed to P stripe directly. + physical_p=$(_btrfs_get_physical ${logical} 2) + devpath_p=$(_btrfs_get_device_path ${logical} 2) + + _scratch_unmount + + echo "Corrupt stripe P at devpath $devpath_p physical $physical_p" \ + >> $seqres.full + $XFS_IO_PROG -d -c "pwrite -S 0xff -b 64K $physical_p 64K" $devpath_p \ + > /dev/null + + # Do a scrub to try repair the P stripe. + _scratch_mount -o space_cache=v2 + $BTRFS_UTIL_PROG scrub start -BdR $SCRATCH_MNT >> $seqres.full 2>&1 + _scratch_unmount + + # Verify the repaired content directly + local output=$($XFS_IO_PROG -c "pread -qv $physical_p 16" $devpath_p | _filter_xfs_io_offset) + local expect="XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................" + + echo "The first 16 bytes of parity stripe after scrub:" >> $seqres.full + echo $output >> $seqres.full + if [ "$output" != "$expect" ]; then + echo "Unexpected parity content" + echo "has:" + echo "$output" + echo "expect" + echo "$expect" + fi + + # Last safenet, let btrfs check --check-data-csum to do an offline scrub. + $BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1 + if [ $? -ne 0 ]; then + echo "Error detected after the scrub" + fi + _scratch_dev_pool_put +} + +workload raid5 2 +workload raid6 3 + +echo "Silence is golden" +status=0 +exit diff --git a/tests/btrfs/297.out b/tests/btrfs/297.out new file mode 100644 index 00000000..41c373c4 --- /dev/null +++ b/tests/btrfs/297.out @@ -0,0 +1,2 @@ +QA output created by 297 +Silence is golden
There is a kernel regression caused by commit 75b470332965 ("btrfs: raid56: migrate recovery and scrub recovery path to use error_bitmap"), which leads to scrub not repairing corrupted parity stripes. So here we add a test case to verify the P/Q stripe scrub behavior by: - Create a RAID5 or RAID6 btrfs with minimal amount of devices This means 2 devices for RAID5, and 3 devices for RAID6. This would result the parity stripe to be a mirror of the only data stripe. And since we have control of the content of data stripes, the content of the P stripe is also fixed. - Create an 64K file The file would cover one data stripe. - Corrupt the P stripe - Scrub the fs If scrub is working, the P stripe would be repaired. Unfortunately scrub can not report any P/Q corruption, limited by its reporting structure. So we can not use the return value of scrub to determine if we repaired anything. - Verify the content of the P stripe - Use "btrfs check --check-data-csum" to double check By above steps, we can verify if the P stripe is properly fixed. Signed-off-by: Qu Wenruo <wqu@suse.com> --- Changelog: v2: - Rebase to the latest misc-next - Use space_cache=v2 mount option instead of nospace_cache New features like block group tree and extent tree v2 requires v2 cache - Fix a white space error --- tests/btrfs/297 | 85 +++++++++++++++++++++++++++++++++++++++++++++ tests/btrfs/297.out | 2 ++ 2 files changed, 87 insertions(+) create mode 100755 tests/btrfs/297 create mode 100644 tests/btrfs/297.out