Message ID | 20230630013614.56667-1-wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: add a test case to make sure scrub can repair parity corruption | expand |
On 30/06/2023 09:36, Qu Wenruo wrote: > There is a kernel regression caused by commit 75b470332965 ("btrfs: > raid56: migrate recovery and scrub recovery path to use error_bitmap"), > which leads to scrub not repairing corrupted parity stripes. > > So here we add a test case to verify the P/Q stripe scrub behavior by: > > - Create a RAID5 or RAID6 btrfs with minimal amount of devices > This means 2 devices for RAID5, and 3 devices for RAID6. > This would result the parity stripe to be a mirror of the only data > stripe. > > And since we have control of the content of data stripes, the content > of the P stripe is also fixed. > > - Create an 64K file > The file would cover one data stripe. > > - Corrupt the P stripe > > - Scrub the fs > If scrub is working, the P stripe would be repaired. > > Unfortunately scrub can not report any P/Q corruption, limited by its > reporting structure. > So we can not use the return value of scrub to determine if we > repaired anything. > > - Verify the content of the P stripe > > - Use "btrfs check --check-data-csum" to double check > > By above steps, we can verify if the P stripe is properly fixed. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > tests/btrfs/294 | 86 +++++++++++++++++++++++++++++++++++++++++++++ > tests/btrfs/294.out | 2 ++ > 2 files changed, 88 insertions(+) > create mode 100755 tests/btrfs/294 > create mode 100644 tests/btrfs/294.out > > diff --git a/tests/btrfs/294 b/tests/btrfs/294 > new file mode 100755 > index 00000000..97b85988 > --- /dev/null > +++ b/tests/btrfs/294 > @@ -0,0 +1,86 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (C) 2023 SUSE Linux Products GmbH. All Rights Reserved. > +# > +# FS QA Test 294 > +# > +# Make sure btrfs scrub can fix parity stripe corruption > +# > +. ./common/preamble > +_begin_fstest auto quick raid scrub > + > +. ./common/filter > + > +_supported_fs btrfs > +_require_odirect > +_require_non_zoned_device "${SCRATCH_DEV}" > +_require_scratch_dev_pool 3 > +_fixed_by_kernel_commit xxxxxxxxxxxx \ > + "btrfs: raid56: always verify the P/Q contents for scrub" > + > +workload() > +{ > + local profile=$1 > + local nr_devs=$2 > + > + echo "=== Testing $nr_devs devices $profile ===" >> $seqres.full > + _scratch_dev_pool_get $nr_devs > + > + _scratch_pool_mkfs -d $profile -m single >> $seqres.full 2>&1 > + # Disable space cache to prevent v1 space cache affecting > + # the result. > + _scratch_mount -o nospace_cache [ 1562.331370] BTRFS error (device sdb3): cannot disable free space tree Mount is failing for the -o nospace_cache option. > + > + # Create one 64K extent which would cover one data stripe. > + $XFS_IO_PROG -f -d -c "pwrite -S 0xaa -b 64K 0 64K" \ > + "$SCRATCH_MNT/foobar" > /dev/null > + sync > + > + # Corrupt the P/Q stripe > + local logical=$(_btrfs_get_first_logical $SCRATCH_MNT/foobar) > + > + # The 2nd copy is pointed to P stripe directly. > + physical_p=$(_btrfs_get_physical ${logical} 2) > + devpath_p=$(_btrfs_get_device_path ${logical} 2) > + > + _scratch_unmount > + > + echo "Corrupt stripe P at devpath $devpath_p physical $physical_p" \ > + >> $seqres.full > + $XFS_IO_PROG -d -c "pwrite -S 0xff -b 64K $physical_p 64K" $devpath_p \ > + > /dev/null > + > + # Do a scrub to try repair the P stripe. > + _scratch_mount -o nospace_cache > + $BTRFS_UTIL_PROG scrub start -BdR $SCRATCH_MNT >> $seqres.full 2>&1 > + _scratch_unmount > + > + # Verify the repaired content directly > + local output=$($XFS_IO_PROG -c "pread -qv $physical_p 16" $devpath_p | _filter_xfs_io_offset) > + local expect="XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................" > + > + echo "The first 16 bytes of parity stripe after scrub:" >> $seqres.full > + echo $output >> $seqres.full > + White space error here. Thanks. > + if [ "$output" != "$expect" ]; then > + echo "Unexpected parity content" > + echo "has:" > + echo "$output" > + echo "expect" > + echo "$expect" > + fi > + > + # Last safenet, let btrfs check --check-data-csum to do an offline scrub. > + $BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1 > + if [ $? -ne 0 ]; then > + echo "Error detected after the scrub" > + fi > + _scratch_dev_pool_put > +} > + > +workload raid5 2 > +workload raid6 3 > + > +echo "Silence is golden" > +status=0 > +exit > diff --git a/tests/btrfs/294.out b/tests/btrfs/294.out > new file mode 100644 > index 00000000..c09531de > --- /dev/null > +++ b/tests/btrfs/294.out > @@ -0,0 +1,2 @@ > +QA output created by 294 > +Silence is golden
On 2023/7/22 21:24, Anand Jain wrote: > > > On 30/06/2023 09:36, Qu Wenruo wrote: >> There is a kernel regression caused by commit 75b470332965 ("btrfs: >> raid56: migrate recovery and scrub recovery path to use error_bitmap"), >> which leads to scrub not repairing corrupted parity stripes. >> >> So here we add a test case to verify the P/Q stripe scrub behavior by: >> >> - Create a RAID5 or RAID6 btrfs with minimal amount of devices >> This means 2 devices for RAID5, and 3 devices for RAID6. >> This would result the parity stripe to be a mirror of the only data >> stripe. >> >> And since we have control of the content of data stripes, the content >> of the P stripe is also fixed. >> >> - Create an 64K file >> The file would cover one data stripe. >> >> - Corrupt the P stripe >> >> - Scrub the fs >> If scrub is working, the P stripe would be repaired. >> >> Unfortunately scrub can not report any P/Q corruption, limited by its >> reporting structure. >> So we can not use the return value of scrub to determine if we >> repaired anything. >> >> - Verify the content of the P stripe >> >> - Use "btrfs check --check-data-csum" to double check >> >> By above steps, we can verify if the P stripe is properly fixed. >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> tests/btrfs/294 | 86 +++++++++++++++++++++++++++++++++++++++++++++ >> tests/btrfs/294.out | 2 ++ >> 2 files changed, 88 insertions(+) >> create mode 100755 tests/btrfs/294 >> create mode 100644 tests/btrfs/294.out >> >> diff --git a/tests/btrfs/294 b/tests/btrfs/294 >> new file mode 100755 >> index 00000000..97b85988 >> --- /dev/null >> +++ b/tests/btrfs/294 >> @@ -0,0 +1,86 @@ >> +#! /bin/bash >> +# SPDX-License-Identifier: GPL-2.0 >> +# Copyright (C) 2023 SUSE Linux Products GmbH. All Rights Reserved. >> +# >> +# FS QA Test 294 >> +# >> +# Make sure btrfs scrub can fix parity stripe corruption >> +# >> +. ./common/preamble >> +_begin_fstest auto quick raid scrub >> + >> +. ./common/filter >> + >> +_supported_fs btrfs >> +_require_odirect >> +_require_non_zoned_device "${SCRATCH_DEV}" >> +_require_scratch_dev_pool 3 >> +_fixed_by_kernel_commit xxxxxxxxxxxx \ >> + "btrfs: raid56: always verify the P/Q contents for scrub" >> + >> +workload() >> +{ >> + local profile=$1 >> + local nr_devs=$2 >> + >> + echo "=== Testing $nr_devs devices $profile ===" >> $seqres.full >> + _scratch_dev_pool_get $nr_devs >> + >> + _scratch_pool_mkfs -d $profile -m single >> $seqres.full 2>&1 >> + # Disable space cache to prevent v1 space cache affecting >> + # the result. > > >> + _scratch_mount -o nospace_cache > > [ 1562.331370] BTRFS error (device sdb3): cannot disable free space tree > > > Mount is failing for the -o nospace_cache option. Indeed, I guess this is caused by block group tree feature. The original intention is just to prevent v1 cache, considering v2 is required by several features, I'd switch it to explicitly require v2 cache instead. Thanks, Qu > >> + >> + # Create one 64K extent which would cover one data stripe. >> + $XFS_IO_PROG -f -d -c "pwrite -S 0xaa -b 64K 0 64K" \ >> + "$SCRATCH_MNT/foobar" > /dev/null >> + sync >> + >> + # Corrupt the P/Q stripe >> + local logical=$(_btrfs_get_first_logical $SCRATCH_MNT/foobar) >> + >> + # The 2nd copy is pointed to P stripe directly. >> + physical_p=$(_btrfs_get_physical ${logical} 2) >> + devpath_p=$(_btrfs_get_device_path ${logical} 2) >> + >> + _scratch_unmount >> + >> + echo "Corrupt stripe P at devpath $devpath_p physical $physical_p" \ >> + >> $seqres.full >> + $XFS_IO_PROG -d -c "pwrite -S 0xff -b 64K $physical_p 64K" >> $devpath_p \ >> + > /dev/null >> + >> + # Do a scrub to try repair the P stripe. >> + _scratch_mount -o nospace_cache >> + $BTRFS_UTIL_PROG scrub start -BdR $SCRATCH_MNT >> $seqres.full 2>&1 >> + _scratch_unmount >> + >> + # Verify the repaired content directly >> + local output=$($XFS_IO_PROG -c "pread -qv $physical_p 16" >> $devpath_p | _filter_xfs_io_offset) >> + local expect="XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa >> aa aa aa ................" >> + >> + echo "The first 16 bytes of parity stripe after scrub:" >> >> $seqres.full > >> + echo $output >> $seqres.full >> + > > White space error here. > > Thanks. > >> + if [ "$output" != "$expect" ]; then > + echo "Unexpected >> parity content" >> + echo "has:" >> + echo "$output" >> + echo "expect" >> + echo "$expect" >> + fi >> + >> + # Last safenet, let btrfs check --check-data-csum to do an >> offline scrub. >> + $BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> >> $seqres.full 2>&1 >> + if [ $? -ne 0 ]; then >> + echo "Error detected after the scrub" >> + fi >> + _scratch_dev_pool_put >> +} >> + >> +workload raid5 2 >> +workload raid6 3 >> + >> +echo "Silence is golden" >> +status=0 >> +exit >> diff --git a/tests/btrfs/294.out b/tests/btrfs/294.out >> new file mode 100644 >> index 00000000..c09531de >> --- /dev/null >> +++ b/tests/btrfs/294.out >> @@ -0,0 +1,2 @@ >> +QA output created by 294 >> +Silence is golden >
diff --git a/tests/btrfs/294 b/tests/btrfs/294 new file mode 100755 index 00000000..97b85988 --- /dev/null +++ b/tests/btrfs/294 @@ -0,0 +1,86 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2023 SUSE Linux Products GmbH. All Rights Reserved. +# +# FS QA Test 294 +# +# Make sure btrfs scrub can fix parity stripe corruption +# +. ./common/preamble +_begin_fstest auto quick raid scrub + +. ./common/filter + +_supported_fs btrfs +_require_odirect +_require_non_zoned_device "${SCRATCH_DEV}" +_require_scratch_dev_pool 3 +_fixed_by_kernel_commit xxxxxxxxxxxx \ + "btrfs: raid56: always verify the P/Q contents for scrub" + +workload() +{ + local profile=$1 + local nr_devs=$2 + + echo "=== Testing $nr_devs devices $profile ===" >> $seqres.full + _scratch_dev_pool_get $nr_devs + + _scratch_pool_mkfs -d $profile -m single >> $seqres.full 2>&1 + # Disable space cache to prevent v1 space cache affecting + # the result. + _scratch_mount -o nospace_cache + + # Create one 64K extent which would cover one data stripe. + $XFS_IO_PROG -f -d -c "pwrite -S 0xaa -b 64K 0 64K" \ + "$SCRATCH_MNT/foobar" > /dev/null + sync + + # Corrupt the P/Q stripe + local logical=$(_btrfs_get_first_logical $SCRATCH_MNT/foobar) + + # The 2nd copy is pointed to P stripe directly. + physical_p=$(_btrfs_get_physical ${logical} 2) + devpath_p=$(_btrfs_get_device_path ${logical} 2) + + _scratch_unmount + + echo "Corrupt stripe P at devpath $devpath_p physical $physical_p" \ + >> $seqres.full + $XFS_IO_PROG -d -c "pwrite -S 0xff -b 64K $physical_p 64K" $devpath_p \ + > /dev/null + + # Do a scrub to try repair the P stripe. + _scratch_mount -o nospace_cache + $BTRFS_UTIL_PROG scrub start -BdR $SCRATCH_MNT >> $seqres.full 2>&1 + _scratch_unmount + + # Verify the repaired content directly + local output=$($XFS_IO_PROG -c "pread -qv $physical_p 16" $devpath_p | _filter_xfs_io_offset) + local expect="XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................" + + echo "The first 16 bytes of parity stripe after scrub:" >> $seqres.full + echo $output >> $seqres.full + + if [ "$output" != "$expect" ]; then + echo "Unexpected parity content" + echo "has:" + echo "$output" + echo "expect" + echo "$expect" + fi + + # Last safenet, let btrfs check --check-data-csum to do an offline scrub. + $BTRFS_UTIL_PROG check --check-data-csum $SCRATCH_DEV >> $seqres.full 2>&1 + if [ $? -ne 0 ]; then + echo "Error detected after the scrub" + fi + _scratch_dev_pool_put +} + +workload raid5 2 +workload raid6 3 + +echo "Silence is golden" +status=0 +exit diff --git a/tests/btrfs/294.out b/tests/btrfs/294.out new file mode 100644 index 00000000..c09531de --- /dev/null +++ b/tests/btrfs/294.out @@ -0,0 +1,2 @@ +QA output created by 294 +Silence is golden
There is a kernel regression caused by commit 75b470332965 ("btrfs: raid56: migrate recovery and scrub recovery path to use error_bitmap"), which leads to scrub not repairing corrupted parity stripes. So here we add a test case to verify the P/Q stripe scrub behavior by: - Create a RAID5 or RAID6 btrfs with minimal amount of devices This means 2 devices for RAID5, and 3 devices for RAID6. This would result the parity stripe to be a mirror of the only data stripe. And since we have control of the content of data stripes, the content of the P stripe is also fixed. - Create an 64K file The file would cover one data stripe. - Corrupt the P stripe - Scrub the fs If scrub is working, the P stripe would be repaired. Unfortunately scrub can not report any P/Q corruption, limited by its reporting structure. So we can not use the return value of scrub to determine if we repaired anything. - Verify the content of the P stripe - Use "btrfs check --check-data-csum" to double check By above steps, we can verify if the P stripe is properly fixed. Signed-off-by: Qu Wenruo <wqu@suse.com> --- tests/btrfs/294 | 86 +++++++++++++++++++++++++++++++++++++++++++++ tests/btrfs/294.out | 2 ++ 2 files changed, 88 insertions(+) create mode 100755 tests/btrfs/294 create mode 100644 tests/btrfs/294.out