diff mbox series

[v4] btrfs: a new test case to verify mount behavior with background remounting

Message ID 20241106054328.19842-1-wqu@suse.com (mailing list archive)
State New
Headers show
Series [v4] btrfs: a new test case to verify mount behavior with background remounting | expand

Commit Message

Qu Wenruo Nov. 6, 2024, 5:43 a.m. UTC
[BUG]
When there is a process in the background remounting a btrfs, switching
between RO/RW, then another process try to mount another subvolume of
the same btrfs read-only, we can hit a race causing the RW mount to fail
with -EBUSY:

[CAUSE]
During the btrfs mount, to support mounting different subvolumes with
different RO/RW flags, we have a small hack during the mount:

  Retry with matching RO flags if the initial mount fail with -EBUSY.

The problem is, during that retry we do not hold any super block lock
(s_umount), this meanings there can be a remount process changing the RO
flags of the original fs super block.

If so, we can have an EBUSY error during retry.
And this time we treat any failure as an error, without any retry and
cause the above EBUSY mount failure.

[FIX]
The fix is already sent to the mailing list.
The fix is to allow btrfs to have different RO flag between super block
and mount point during mount, and if the RO flag mismatch, reconfigure
the fs to RW with s_umount hold, so that there will be no race.

[TEST CASE]
The test case will create two processes:

- Remounting an existing subvolume mount point
  Switching between RO and RW

- Mounting another subvolume RW
  After a successful mount, unmount and retry.

This is enough to trigger the -EBUSY error in less than 5 seconds.
To be extra safe, the test case will run for 10 seconds at least, and
follow TIME_FACTOR for extra loads.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
---
Changelog:
v4:
- Add the missing reviewed-by tag from Filipe

v3:
- Also remove the default temporaray files in cleanup

- Unset mount/remount pid and avoid killing them if unset during cleanup

- Remove the mount points to avoid name conflicts

- Extra comments on the the final unmounts for both mount point and
  cleanup

v2:
- Better signal handling for both remount and mount workload
  Need to do the extra handling for both workload

- Wait for any child process to exit before cleanup

- Keep the stderr of the final rm command

- Keep the stderr of the unmount of $subv1_mount
  Which should always be mounted.

- Use "$UMOUNT_PROG" instead of direct "umount"

- Use "_mount" instead of direct "mount"

- Fix a typo of "block"
---
 tests/btrfs/325     | 107 ++++++++++++++++++++++++++++++++++++++++++++
 tests/btrfs/325.out |   2 +
 2 files changed, 109 insertions(+)
 create mode 100755 tests/btrfs/325
 create mode 100644 tests/btrfs/325.out

Comments

Zorro Lang Nov. 6, 2024, 9:35 a.m. UTC | #1
On Wed, Nov 06, 2024 at 04:13:28PM +1030, Qu Wenruo wrote:
> [BUG]
> When there is a process in the background remounting a btrfs, switching
> between RO/RW, then another process try to mount another subvolume of
> the same btrfs read-only, we can hit a race causing the RW mount to fail
> with -EBUSY:
> 
> [CAUSE]
> During the btrfs mount, to support mounting different subvolumes with
> different RO/RW flags, we have a small hack during the mount:
> 
>   Retry with matching RO flags if the initial mount fail with -EBUSY.
> 
> The problem is, during that retry we do not hold any super block lock
> (s_umount), this meanings there can be a remount process changing the RO
> flags of the original fs super block.
> 
> If so, we can have an EBUSY error during retry.
> And this time we treat any failure as an error, without any retry and
> cause the above EBUSY mount failure.
> 
> [FIX]
> The fix is already sent to the mailing list.
> The fix is to allow btrfs to have different RO flag between super block
> and mount point during mount, and if the RO flag mismatch, reconfigure
> the fs to RW with s_umount hold, so that there will be no race.
> 
> [TEST CASE]
> The test case will create two processes:
> 
> - Remounting an existing subvolume mount point
>   Switching between RO and RW
> 
> - Mounting another subvolume RW
>   After a successful mount, unmount and retry.
> 
> This is enough to trigger the -EBUSY error in less than 5 seconds.
> To be extra safe, the test case will run for 10 seconds at least, and
> follow TIME_FACTOR for extra loads.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> ---
> Changelog:
> v4:
> - Add the missing reviewed-by tag from Filipe
> 
> v3:
> - Also remove the default temporaray files in cleanup
> 
> - Unset mount/remount pid and avoid killing them if unset during cleanup
> 
> - Remove the mount points to avoid name conflicts
> 
> - Extra comments on the the final unmounts for both mount point and
>   cleanup
> 
> v2:
> - Better signal handling for both remount and mount workload
>   Need to do the extra handling for both workload
> 
> - Wait for any child process to exit before cleanup
> 
> - Keep the stderr of the final rm command
> 
> - Keep the stderr of the unmount of $subv1_mount
>   Which should always be mounted.
> 
> - Use "$UMOUNT_PROG" instead of direct "umount"
> 
> - Use "_mount" instead of direct "mount"
> 
> - Fix a typo of "block"
> ---

Thanks for this new test case, this version is good to me:

Reviewed-by: Zorro Lang <zlang@redhat.com>

>  tests/btrfs/325     | 107 ++++++++++++++++++++++++++++++++++++++++++++
>  tests/btrfs/325.out |   2 +
>  2 files changed, 109 insertions(+)
>  create mode 100755 tests/btrfs/325
>  create mode 100644 tests/btrfs/325.out
> 
> diff --git a/tests/btrfs/325 b/tests/btrfs/325
> new file mode 100755
> index 00000000..0f4984f1
> --- /dev/null
> +++ b/tests/btrfs/325
> @@ -0,0 +1,107 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved.
> +#
> +# FS QA Test 325
> +#
> +# Test that mounting a subvolume read-write will success, with another
> +# subvolume being remounted RO/RW at background
> +#
> +. ./common/preamble
> +_begin_fstest auto quick mount remount
> +
> +_fixed_by_kernel_commit xxxxxxxxxxxx \
> +	"btrfs: fix mount failure due to remount races"
> +
> +_cleanup()
> +{
> +	cd /
> +	rm -rf -- $tmp.*
> +	[ -n "$mount_pid" ] && kill $mount_pid &> /dev/null
> +	[ -n "$remount_pid" ] && kill $remount_pid &> /dev/null
> +	wait
> +	$UMOUNT_PROG "$subv1_mount" &> /dev/null
> +	$UMOUNT_PROG "$subv2_mount" &> /dev/null
> +	rm -rf -- "$subv1_mount" "$subv2_mount"
> +}
> +
> +# For the extra mount points
> +_require_test
> +_require_scratch
> +
> +_scratch_mkfs >> $seqres.full 2>&1
> +_scratch_mount
> +$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/subvol1 >> $seqres.full
> +$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/subvol2 >> $seqres.full
> +_scratch_unmount
> +
> +subv1_mount="$TEST_DIR/subvol1_mount"
> +subv2_mount="$TEST_DIR/subvol2_mount"
> +rm -rf "$subv1_mount" "$subv2_mount"
> +mkdir -p "$subv1_mount"
> +mkdir -p "$subv2_mount"
> +_mount "$SCRATCH_DEV" "$subv1_mount" -o subvol=subvol1
> +
> +# Subv1 remount workload, mount the subv1 and switching it between
> +# RO and RW.
> +remount_workload()
> +{
> +	trap "wait; exit" SIGTERM
> +	while true; do
> +		_mount -o remount,ro "$subv1_mount"
> +		_mount -o remount,rw "$subv1_mount"
> +	done
> +}
> +
> +remount_workload &
> +remount_pid=$!
> +
> +# Subv2 rw mount workload
> +# For unpatched kernel, this will fail with -EBUSY.
> +#
> +# To maintain the per-subvolume RO/RW mount, if the existing btrfs is
> +# mounted RO, unpatched btrfs will retry with its RO flag reverted,
> +# then reconfigure the fs to RW.
> +#
> +# But such retry has no super block lock hold, thus if another remount
> +# process has already remounted the fs RW, the attempt will fail and
> +# return -EBUSY.
> +#
> +# Patched kernel will allow the superblock and mount point RO flags
> +# to differ, and then hold the s_umount to reconfigure the super block
> +# to RW, preventing any possible race.
> +mount_workload()
> +{
> +	trap "wait; exit" SIGTERM
> +	while true; do
> +		_mount "$SCRATCH_DEV" "$subv2_mount"
> +		$UMOUNT_PROG "$subv2_mount"
> +	done
> +}
> +
> +mount_workload &
> +mount_pid=$!
> +
> +sleep $(( 10 * $TIME_FACTOR ))
> +
> +kill "$remount_pid" "$mount_pid"
> +wait
> +unset remount_pid mount_pid
> +
> +# Subv1 is always mounted, thus the umount should never fail.
> +$UMOUNT_PROG "$subv1_mount"
> +
> +# Subv2 may have already been unmounted, so here we ignore all output.
> +# This may hide some errors like -EBUSY, but the next rm line would
> +# detect any still mounted subvolume so we're still safe.
> +$UMOUNT_PROG "$subv2_mount" &> /dev/null
> +
> +# If above unmount, especially subv2 is not properly unmounted,
> +# the rm should fail with some error message
> +rm -r -- "$subv1_mount" "$subv2_mount"
> +
> +echo "Silence is golden"
> +
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/btrfs/325.out b/tests/btrfs/325.out
> new file mode 100644
> index 00000000..cf13795c
> --- /dev/null
> +++ b/tests/btrfs/325.out
> @@ -0,0 +1,2 @@
> +QA output created by 325
> +Silence is golden
> -- 
> 2.46.0
> 
>
Anand Jain Nov. 7, 2024, 7:17 a.m. UTC | #2
>> +# Subv2 may have already been unmounted, so here we ignore all output.
>> +# This may hide some errors like -EBUSY, but the next rm line would
>> +# detect any still mounted subvolume so we're still safe.
>> +$UMOUNT_PROG "$subv2_mount" &> /dev/null
>> +
>> +# If above unmount, especially subv2 is not properly unmounted,
>> +# the rm should fail with some error message
>> +rm -r -- "$subv1_mount" "$subv2_mount"

nit:
Nice, if unmounting $subvol2_mount fails, rm will fail as well.
To capture the unmount error why not redirect its output
to the test case’s full log:
  unmount $subvol2_mount >> $seqres.full.
If you're fine with it, Zorro can consider to change this
during the merge. Nevertheless, the code looks good.

   Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thx, Anand
Qu Wenruo Nov. 7, 2024, 7:43 a.m. UTC | #3
在 2024/11/7 17:47, Anand Jain 写道:
>
>
>
>>> +# Subv2 may have already been unmounted, so here we ignore all output.
>>> +# This may hide some errors like -EBUSY, but the next rm line would
>>> +# detect any still mounted subvolume so we're still safe.
>>> +$UMOUNT_PROG "$subv2_mount" &> /dev/null
>>> +
>>> +# If above unmount, especially subv2 is not properly unmounted,
>>> +# the rm should fail with some error message
>>> +rm -r -- "$subv1_mount" "$subv2_mount"
>
> nit:
> Nice, if unmounting $subvol2_mount fails, rm will fail as well.
> To capture the unmount error why not redirect its output
> to the test case’s full log:
>   unmount $subvol2_mount >> $seqres.full.

Because if $subvol2_mount is already unmounted (e.g. the interruption
happens after the umount funished), then above command will cause stderr
to pollute the golden output.

So at least we need to redirect both stderr and stdout, e.g.

   umount $subvol2_mount >> $seqres.full 2>&1

I'm fine with newer change, although that can always be done when
debugging, and with the initial test (and all the failures I have hit),
it should be good enough for now, until we got some new regression.

Thanks,
Qu

> If you're fine with it, Zorro can consider to change this
> during the merge. Nevertheless, the code looks good.
>
>    Reviewed-by: Anand Jain <anand.jain@oracle.com>
>
> Thx, Anand
>
diff mbox series

Patch

diff --git a/tests/btrfs/325 b/tests/btrfs/325
new file mode 100755
index 00000000..0f4984f1
--- /dev/null
+++ b/tests/btrfs/325
@@ -0,0 +1,107 @@ 
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved.
+#
+# FS QA Test 325
+#
+# Test that mounting a subvolume read-write will success, with another
+# subvolume being remounted RO/RW at background
+#
+. ./common/preamble
+_begin_fstest auto quick mount remount
+
+_fixed_by_kernel_commit xxxxxxxxxxxx \
+	"btrfs: fix mount failure due to remount races"
+
+_cleanup()
+{
+	cd /
+	rm -rf -- $tmp.*
+	[ -n "$mount_pid" ] && kill $mount_pid &> /dev/null
+	[ -n "$remount_pid" ] && kill $remount_pid &> /dev/null
+	wait
+	$UMOUNT_PROG "$subv1_mount" &> /dev/null
+	$UMOUNT_PROG "$subv2_mount" &> /dev/null
+	rm -rf -- "$subv1_mount" "$subv2_mount"
+}
+
+# For the extra mount points
+_require_test
+_require_scratch
+
+_scratch_mkfs >> $seqres.full 2>&1
+_scratch_mount
+$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/subvol1 >> $seqres.full
+$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/subvol2 >> $seqres.full
+_scratch_unmount
+
+subv1_mount="$TEST_DIR/subvol1_mount"
+subv2_mount="$TEST_DIR/subvol2_mount"
+rm -rf "$subv1_mount" "$subv2_mount"
+mkdir -p "$subv1_mount"
+mkdir -p "$subv2_mount"
+_mount "$SCRATCH_DEV" "$subv1_mount" -o subvol=subvol1
+
+# Subv1 remount workload, mount the subv1 and switching it between
+# RO and RW.
+remount_workload()
+{
+	trap "wait; exit" SIGTERM
+	while true; do
+		_mount -o remount,ro "$subv1_mount"
+		_mount -o remount,rw "$subv1_mount"
+	done
+}
+
+remount_workload &
+remount_pid=$!
+
+# Subv2 rw mount workload
+# For unpatched kernel, this will fail with -EBUSY.
+#
+# To maintain the per-subvolume RO/RW mount, if the existing btrfs is
+# mounted RO, unpatched btrfs will retry with its RO flag reverted,
+# then reconfigure the fs to RW.
+#
+# But such retry has no super block lock hold, thus if another remount
+# process has already remounted the fs RW, the attempt will fail and
+# return -EBUSY.
+#
+# Patched kernel will allow the superblock and mount point RO flags
+# to differ, and then hold the s_umount to reconfigure the super block
+# to RW, preventing any possible race.
+mount_workload()
+{
+	trap "wait; exit" SIGTERM
+	while true; do
+		_mount "$SCRATCH_DEV" "$subv2_mount"
+		$UMOUNT_PROG "$subv2_mount"
+	done
+}
+
+mount_workload &
+mount_pid=$!
+
+sleep $(( 10 * $TIME_FACTOR ))
+
+kill "$remount_pid" "$mount_pid"
+wait
+unset remount_pid mount_pid
+
+# Subv1 is always mounted, thus the umount should never fail.
+$UMOUNT_PROG "$subv1_mount"
+
+# Subv2 may have already been unmounted, so here we ignore all output.
+# This may hide some errors like -EBUSY, but the next rm line would
+# detect any still mounted subvolume so we're still safe.
+$UMOUNT_PROG "$subv2_mount" &> /dev/null
+
+# If above unmount, especially subv2 is not properly unmounted,
+# the rm should fail with some error message
+rm -r -- "$subv1_mount" "$subv2_mount"
+
+echo "Silence is golden"
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/325.out b/tests/btrfs/325.out
new file mode 100644
index 00000000..cf13795c
--- /dev/null
+++ b/tests/btrfs/325.out
@@ -0,0 +1,2 @@ 
+QA output created by 325
+Silence is golden