Message ID | 20241101102956.174733-1-wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] btrfs: a new test case to verify mount behavior with background remounting | expand |
On Fri, Nov 1, 2024 at 10:30 AM Qu Wenruo <wqu@suse.com> wrote: > > [BUG] > When there is a process in the background remounting a btrfs, switching > between RO/RW, then another process try to mount another subvolume of > the same btrfs read-only, we can hit a race causing the RW mount to fail > with -EBUSY: > > [CAUSE] > During the btrfs mount, to support mounting different subvolumes with > different RO/RW flags, we have a small hack during the mount: > > Retry with matching RO flags if the initial mount fail with -EBUSY. > > The problem is, during that retry we do not hold any super block lock > (s_umount), this meanings there can be a remount process changing the RO > flags of the original fs super block. > > If so, we can have an EBUSY error during retry. > And this time we treat any failure as an error, without any retry and > cause the above EBUSY mount failure. > > [FIX] > The fix is already sent to the mailing list. > The fix is to allow btrfs to have different RO flag between super block > and mount point during mount, and if the RO flag mismatch, reconfigure > the fs to RW with s_umount hold, so that there will be no race. > > [TEST CASE] > The test case will create two processes: > > - Remounting an existing subvolume mount point > Switching between RO and RW > > - Mounting another subvolume RW > After a successful mount, unmount and retry. > > This is enough to trigger the -EBUSY error in less than 5 seconds. > To be extra safe, the test case will run for 10 seconds at least, and > follow TIME_FACTOR for extra loads. > > Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Looks good. It fails with an unpatched kernel and passes with a patched kernel. Thanks. > --- > Changelog: > v2: > - Better signal handling for both remount and mount workload > Need to do the extra handling for both workload > > - Wait for any child process to exit before cleanup > > - Keep the stderr of the final rm command > > - Keep the stderr of the unmount of $subv1_mount > Which should always be mounted. > > - Use "$UMOUNT_PROG" instead of direct "umount" > > - Use "_mount" instead of direct "mount" > > - Fix a typo of "block" > --- > tests/btrfs/325 | 99 +++++++++++++++++++++++++++++++++++++++++++++ > tests/btrfs/325.out | 2 + > 2 files changed, 101 insertions(+) > create mode 100755 tests/btrfs/325 > create mode 100644 tests/btrfs/325.out > > diff --git a/tests/btrfs/325 b/tests/btrfs/325 > new file mode 100755 > index 00000000..ffa10c61 > --- /dev/null > +++ b/tests/btrfs/325 > @@ -0,0 +1,99 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved. > +# > +# FS QA Test 325 > +# > +# Test that mounting a subvolume read-write will success, with another > +# subvolume being remounted RO/RW at background > +# > +. ./common/preamble > +_begin_fstest auto quick mount remount > + > +_fixed_by_kernel_commit xxxxxxxxxxxx \ > + "btrfs: fix mount failure due to remount races" > + > +_cleanup() > +{ > + cd / > + kill "$remount_pid" "$mount_pid" &> /dev/null > + wait &> /dev/null > + $UMOUNT_PROG "$subv1_mount" &> /dev/null > + $UMOUNT_PROG "$subv2_mount" &> /dev/null > + rm -rf -- "$subv1_mount" &> /dev/null > + rm -rf -- "$subv2_mount" &> /dev/null > +} > + > +# For the extra mount points > +_require_test > +_require_scratch > + > +_scratch_mkfs >> $seqres.full 2>&1 > +_scratch_mount > +$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/subvol1 >> $seqres.full > +$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/subvol2 >> $seqres.full > +_scratch_unmount > + > +subv1_mount="$TEST_DIR/subvol1_mount" > +subv2_mount="$TEST_DIR/subvol2_mount" > +mkdir -p "$subv1_mount" > +mkdir -p "$subv2_mount" > +_mount "$SCRATCH_DEV" "$subv1_mount" -o subvol=subvol1 > + > +# Subv1 remount workload, mount the subv1 and switching it between > +# RO and RW. > +remount_workload() > +{ > + trap "wait; exit" SIGTERM > + while true; do > + _mount -o remount,ro "$subv1_mount" > + _mount -o remount,rw "$subv1_mount" > + done > +} > + > +remount_workload & > +remount_pid=$! > + > +# Subv2 rw mount workload > +# For unpatched kernel, this will fail with -EBUSY. > +# > +# To maintain the per-subvolume RO/RW mount, if the existing btrfs is > +# mounted RO, unpatched btrfs will retry with its RO flag reverted, > +# then reconfigure the fs to RW. > +# > +# But such retry has no super block lock hold, thus if another remount > +# process has already remounted the fs RW, the attempt will fail and > +# return -EBUSY. > +# > +# Patched kernel will allow the superblock and mount point RO flags > +# to differ, and then hold the s_umount to reconfigure the super block > +# to RW, preventing any possible race. > +mount_workload() > +{ > + trap "wait; exit" SIGTERM > + while true; do > + _mount "$SCRATCH_DEV" "$subv2_mount" > + $UMOUNT_PROG "$subv2_mount" > + done > +} > + > +mount_workload & > +mount_pid=$! > + > +sleep $(( 10 * $TIME_FACTOR )) > + > +kill "$remount_pid" "$mount_pid" > +wait > + > +# subv1 is always mounted, thus the umount should not fail. > +$UMOUNT_PROG "$subv1_mount" > +$UMOUNT_PROG "$subv2_mount" &> /dev/null > + > +rm -rf -- "$subv1_mount" "$subv2_mount" > + > + > +echo "Silence is golden" > + > +# success, all done > +status=0 > +exit > diff --git a/tests/btrfs/325.out b/tests/btrfs/325.out > new file mode 100644 > index 00000000..cf13795c > --- /dev/null > +++ b/tests/btrfs/325.out > @@ -0,0 +1,2 @@ > +QA output created by 325 > +Silence is golden > -- > 2.46.0 > >
On Fri, Nov 01, 2024 at 08:59:56PM +1030, Qu Wenruo wrote: > [BUG] > When there is a process in the background remounting a btrfs, switching > between RO/RW, then another process try to mount another subvolume of > the same btrfs read-only, we can hit a race causing the RW mount to fail > with -EBUSY: > > [CAUSE] > During the btrfs mount, to support mounting different subvolumes with > different RO/RW flags, we have a small hack during the mount: > > Retry with matching RO flags if the initial mount fail with -EBUSY. > > The problem is, during that retry we do not hold any super block lock > (s_umount), this meanings there can be a remount process changing the RO > flags of the original fs super block. > > If so, we can have an EBUSY error during retry. > And this time we treat any failure as an error, without any retry and > cause the above EBUSY mount failure. > > [FIX] > The fix is already sent to the mailing list. > The fix is to allow btrfs to have different RO flag between super block > and mount point during mount, and if the RO flag mismatch, reconfigure > the fs to RW with s_umount hold, so that there will be no race. > > [TEST CASE] > The test case will create two processes: > > - Remounting an existing subvolume mount point > Switching between RO and RW > > - Mounting another subvolume RW > After a successful mount, unmount and retry. > > This is enough to trigger the -EBUSY error in less than 5 seconds. > To be extra safe, the test case will run for 10 seconds at least, and > follow TIME_FACTOR for extra loads. > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > Changelog: > v2: > - Better signal handling for both remount and mount workload > Need to do the extra handling for both workload > > - Wait for any child process to exit before cleanup > > - Keep the stderr of the final rm command > > - Keep the stderr of the unmount of $subv1_mount > Which should always be mounted. > > - Use "$UMOUNT_PROG" instead of direct "umount" > > - Use "_mount" instead of direct "mount" > > - Fix a typo of "block" > --- Thanks for this new case. A few of picky review points below :) > tests/btrfs/325 | 99 +++++++++++++++++++++++++++++++++++++++++++++ > tests/btrfs/325.out | 2 + > 2 files changed, 101 insertions(+) > create mode 100755 tests/btrfs/325 > create mode 100644 tests/btrfs/325.out > > diff --git a/tests/btrfs/325 b/tests/btrfs/325 > new file mode 100755 > index 00000000..ffa10c61 > --- /dev/null > +++ b/tests/btrfs/325 > @@ -0,0 +1,99 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved. > +# > +# FS QA Test 325 > +# > +# Test that mounting a subvolume read-write will success, with another > +# subvolume being remounted RO/RW at background > +# > +. ./common/preamble > +_begin_fstest auto quick mount remount > + > +_fixed_by_kernel_commit xxxxxxxxxxxx \ > + "btrfs: fix mount failure due to remount races" > + > +_cleanup() > +{ > + cd / rm -r -f $tmp.* > + kill "$remount_pid" "$mount_pid" &> /dev/null With below "unset" (see below), we can: [ -n "$mount_pid" ] && kill $mount_pid [ -n "$remount_pid" ] && kill $remount_pid > + wait &> /dev/null Just curious, what can cause "wait" command print something? > + $UMOUNT_PROG "$subv1_mount" &> /dev/null > + $UMOUNT_PROG "$subv2_mount" &> /dev/null > + rm -rf -- "$subv1_mount" &> /dev/null > + rm -rf -- "$subv2_mount" &> /dev/null rm "-f" prints nothing, so rm -rf -- "$subv1_mount" "$subv2_mount" > +} > + > +# For the extra mount points > +_require_test > +_require_scratch > + > +_scratch_mkfs >> $seqres.full 2>&1 > +_scratch_mount > +$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/subvol1 >> $seqres.full > +$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/subvol2 >> $seqres.full Hmm... I wondering what'll happen, if remove these two lines then run this test on other filesystems :) > +_scratch_unmount > + > +subv1_mount="$TEST_DIR/subvol1_mount" > +subv2_mount="$TEST_DIR/subvol2_mount" Better to try to remove them at first. We can't make sure these names never be existed in $TEST_DIR. > +mkdir -p "$subv1_mount" > +mkdir -p "$subv2_mount" > +_mount "$SCRATCH_DEV" "$subv1_mount" -o subvol=subvol1 > + > +# Subv1 remount workload, mount the subv1 and switching it between > +# RO and RW. > +remount_workload() > +{ > + trap "wait; exit" SIGTERM > + while true; do > + _mount -o remount,ro "$subv1_mount" > + _mount -o remount,rw "$subv1_mount" > + done > +} > + > +remount_workload & > +remount_pid=$! > + > +# Subv2 rw mount workload > +# For unpatched kernel, this will fail with -EBUSY. > +# > +# To maintain the per-subvolume RO/RW mount, if the existing btrfs is > +# mounted RO, unpatched btrfs will retry with its RO flag reverted, > +# then reconfigure the fs to RW. > +# > +# But such retry has no super block lock hold, thus if another remount > +# process has already remounted the fs RW, the attempt will fail and > +# return -EBUSY. > +# > +# Patched kernel will allow the superblock and mount point RO flags > +# to differ, and then hold the s_umount to reconfigure the super block > +# to RW, preventing any possible race. > +mount_workload() > +{ > + trap "wait; exit" SIGTERM > + while true; do > + _mount "$SCRATCH_DEV" "$subv2_mount" > + $UMOUNT_PROG "$subv2_mount" > + done > +} > + > +mount_workload & > +mount_pid=$! > + > +sleep $(( 10 * $TIME_FACTOR )) > + > +kill "$remount_pid" "$mount_pid" > +wait unset remount_pid mount_pid to avoid later kill (in _cleanup) kill other processes. > + > +# subv1 is always mounted, thus the umount should not fail. > +$UMOUNT_PROG "$subv1_mount" > +$UMOUNT_PROG "$subv2_mount" &> /dev/null > + > +rm -rf -- "$subv1_mount" "$subv2_mount" Do this in _cleanup (as above), so this line can be removed if it's not a necessary test step of this case. Others look good to me. Thanks, Zorro > + > + > +echo "Silence is golden" > + > +# success, all done > +status=0 > +exit > diff --git a/tests/btrfs/325.out b/tests/btrfs/325.out > new file mode 100644 > index 00000000..cf13795c > --- /dev/null > +++ b/tests/btrfs/325.out > @@ -0,0 +1,2 @@ > +QA output created by 325 > +Silence is golden > -- > 2.46.0 > >
在 2024/11/1 23:25, Zorro Lang 写道: [...] >> +_cleanup() >> +{ >> + cd / > > rm -r -f $tmp.* The test doesn't utilize any temporary file AFAIK. > >> + kill "$remount_pid" "$mount_pid" &> /dev/null > > With below "unset" (see below), we can: > > [ -n "$mount_pid" ] && kill $mount_pid > [ -n "$remount_pid" ] && kill $remount_pid If both pids unset, kill will do nothing but shows its usage. In that case since we re-direct both stderr and stdout, it will not cause anything wrong. > >> + wait &> /dev/null > > Just curious, what can cause "wait" command print something? "wait" can output errors for invalid options or the pid is not a child one. But without any option it doesn't seem to output any error message. > >> + $UMOUNT_PROG "$subv1_mount" &> /dev/null >> + $UMOUNT_PROG "$subv2_mount" &> /dev/null >> + rm -rf -- "$subv1_mount" &> /dev/null >> + rm -rf -- "$subv2_mount" &> /dev/null > > rm "-f" prints nothing, so > > rm -rf -- "$subv1_mount" "$subv2_mount" Nope. it can output error like -EBUSY if any of the mount point is still being mounted. Thus to be extra safe I prefer to do all the redirection in the cleanup function. As I have hit problems inside the cleanup function too many times, mostly because the previous version is lacking the signal handling for each workload. > >> +} >> + >> +# For the extra mount points >> +_require_test >> +_require_scratch >> + >> +_scratch_mkfs >> $seqres.full 2>&1 >> +_scratch_mount >> +$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/subvol1 >> $seqres.full >> +$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/subvol2 >> $seqres.full > > Hmm... I wondering what'll happen, if remove these two lines then run this test > on other filesystems :) Not sure about other filesystems, but do other filesystems allows different RO/RW flags for the same fs? > >> +_scratch_unmount >> + >> +subv1_mount="$TEST_DIR/subvol1_mount" >> +subv2_mount="$TEST_DIR/subvol2_mount" > > Better to try to remove them at first. We can't make sure these names never > be existed in $TEST_DIR. > >> +mkdir -p "$subv1_mount" >> +mkdir -p "$subv2_mount" >> +_mount "$SCRATCH_DEV" "$subv1_mount" -o subvol=subvol1 >> + >> +# Subv1 remount workload, mount the subv1 and switching it between >> +# RO and RW. >> +remount_workload() >> +{ >> + trap "wait; exit" SIGTERM >> + while true; do >> + _mount -o remount,ro "$subv1_mount" >> + _mount -o remount,rw "$subv1_mount" >> + done >> +} >> + >> +remount_workload & >> +remount_pid=$! >> + >> +# Subv2 rw mount workload >> +# For unpatched kernel, this will fail with -EBUSY. >> +# >> +# To maintain the per-subvolume RO/RW mount, if the existing btrfs is >> +# mounted RO, unpatched btrfs will retry with its RO flag reverted, >> +# then reconfigure the fs to RW. >> +# >> +# But such retry has no super block lock hold, thus if another remount >> +# process has already remounted the fs RW, the attempt will fail and >> +# return -EBUSY. >> +# >> +# Patched kernel will allow the superblock and mount point RO flags >> +# to differ, and then hold the s_umount to reconfigure the super block >> +# to RW, preventing any possible race. >> +mount_workload() >> +{ >> + trap "wait; exit" SIGTERM >> + while true; do >> + _mount "$SCRATCH_DEV" "$subv2_mount" >> + $UMOUNT_PROG "$subv2_mount" >> + done >> +} >> + >> +mount_workload & >> +mount_pid=$! >> + >> +sleep $(( 10 * $TIME_FACTOR )) >> + >> +kill "$remount_pid" "$mount_pid" >> +wait > > unset remount_pid mount_pid > > to avoid later kill (in _cleanup) kill other processes. IIRC the child pid won't be reused until the parent process exits. > >> + >> +# subv1 is always mounted, thus the umount should not fail. >> +$UMOUNT_PROG "$subv1_mount" >> +$UMOUNT_PROG "$subv2_mount" &> /dev/null >> + >> +rm -rf -- "$subv1_mount" "$subv2_mount" > > Do this in _cleanup (as above), so this line can be removed if it's not a > necessary test step of this case. This is an extra safety check, to make sure the unmount is properly done. And that's why here we do not re-direct the output, unlike cleanup. As I have already hit cases where the rm returned EBUSY error due to races with the children processes not handling signal correctly. Thanks, Qu > > Others look good to me. > > Thanks, > Zorro > >> + >> + >> +echo "Silence is golden" >> + >> +# success, all done >> +status=0 >> +exit >> diff --git a/tests/btrfs/325.out b/tests/btrfs/325.out >> new file mode 100644 >> index 00000000..cf13795c >> --- /dev/null >> +++ b/tests/btrfs/325.out >> @@ -0,0 +1,2 @@ >> +QA output created by 325 >> +Silence is golden >> -- >> 2.46.0 >> >> > >
On Sat, Nov 02, 2024 at 07:08:44AM +1030, Qu Wenruo wrote: > >> +$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/subvol1 >> $seqres.full > >> +$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/subvol2 >> $seqres.full > > > > Hmm... I wondering what'll happen, if remove these two lines then run this test > > on other filesystems :) > > Not sure about other filesystems, but do other filesystems allows > different RO/RW flags for the same fs? The subvolumes utilize the bind mount, so other filesystems can do that as well but this will also do the synchronization inside VFS, while the subvolume mount is called internally.
diff --git a/tests/btrfs/325 b/tests/btrfs/325 new file mode 100755 index 00000000..ffa10c61 --- /dev/null +++ b/tests/btrfs/325 @@ -0,0 +1,99 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved. +# +# FS QA Test 325 +# +# Test that mounting a subvolume read-write will success, with another +# subvolume being remounted RO/RW at background +# +. ./common/preamble +_begin_fstest auto quick mount remount + +_fixed_by_kernel_commit xxxxxxxxxxxx \ + "btrfs: fix mount failure due to remount races" + +_cleanup() +{ + cd / + kill "$remount_pid" "$mount_pid" &> /dev/null + wait &> /dev/null + $UMOUNT_PROG "$subv1_mount" &> /dev/null + $UMOUNT_PROG "$subv2_mount" &> /dev/null + rm -rf -- "$subv1_mount" &> /dev/null + rm -rf -- "$subv2_mount" &> /dev/null +} + +# For the extra mount points +_require_test +_require_scratch + +_scratch_mkfs >> $seqres.full 2>&1 +_scratch_mount +$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/subvol1 >> $seqres.full +$BTRFS_UTIL_PROG subvolume create $SCRATCH_MNT/subvol2 >> $seqres.full +_scratch_unmount + +subv1_mount="$TEST_DIR/subvol1_mount" +subv2_mount="$TEST_DIR/subvol2_mount" +mkdir -p "$subv1_mount" +mkdir -p "$subv2_mount" +_mount "$SCRATCH_DEV" "$subv1_mount" -o subvol=subvol1 + +# Subv1 remount workload, mount the subv1 and switching it between +# RO and RW. +remount_workload() +{ + trap "wait; exit" SIGTERM + while true; do + _mount -o remount,ro "$subv1_mount" + _mount -o remount,rw "$subv1_mount" + done +} + +remount_workload & +remount_pid=$! + +# Subv2 rw mount workload +# For unpatched kernel, this will fail with -EBUSY. +# +# To maintain the per-subvolume RO/RW mount, if the existing btrfs is +# mounted RO, unpatched btrfs will retry with its RO flag reverted, +# then reconfigure the fs to RW. +# +# But such retry has no super block lock hold, thus if another remount +# process has already remounted the fs RW, the attempt will fail and +# return -EBUSY. +# +# Patched kernel will allow the superblock and mount point RO flags +# to differ, and then hold the s_umount to reconfigure the super block +# to RW, preventing any possible race. +mount_workload() +{ + trap "wait; exit" SIGTERM + while true; do + _mount "$SCRATCH_DEV" "$subv2_mount" + $UMOUNT_PROG "$subv2_mount" + done +} + +mount_workload & +mount_pid=$! + +sleep $(( 10 * $TIME_FACTOR )) + +kill "$remount_pid" "$mount_pid" +wait + +# subv1 is always mounted, thus the umount should not fail. +$UMOUNT_PROG "$subv1_mount" +$UMOUNT_PROG "$subv2_mount" &> /dev/null + +rm -rf -- "$subv1_mount" "$subv2_mount" + + +echo "Silence is golden" + +# success, all done +status=0 +exit diff --git a/tests/btrfs/325.out b/tests/btrfs/325.out new file mode 100644 index 00000000..cf13795c --- /dev/null +++ b/tests/btrfs/325.out @@ -0,0 +1,2 @@ +QA output created by 325 +Silence is golden
[BUG] When there is a process in the background remounting a btrfs, switching between RO/RW, then another process try to mount another subvolume of the same btrfs read-only, we can hit a race causing the RW mount to fail with -EBUSY: [CAUSE] During the btrfs mount, to support mounting different subvolumes with different RO/RW flags, we have a small hack during the mount: Retry with matching RO flags if the initial mount fail with -EBUSY. The problem is, during that retry we do not hold any super block lock (s_umount), this meanings there can be a remount process changing the RO flags of the original fs super block. If so, we can have an EBUSY error during retry. And this time we treat any failure as an error, without any retry and cause the above EBUSY mount failure. [FIX] The fix is already sent to the mailing list. The fix is to allow btrfs to have different RO flag between super block and mount point during mount, and if the RO flag mismatch, reconfigure the fs to RW with s_umount hold, so that there will be no race. [TEST CASE] The test case will create two processes: - Remounting an existing subvolume mount point Switching between RO and RW - Mounting another subvolume RW After a successful mount, unmount and retry. This is enough to trigger the -EBUSY error in less than 5 seconds. To be extra safe, the test case will run for 10 seconds at least, and follow TIME_FACTOR for extra loads. Signed-off-by: Qu Wenruo <wqu@suse.com> --- Changelog: v2: - Better signal handling for both remount and mount workload Need to do the extra handling for both workload - Wait for any child process to exit before cleanup - Keep the stderr of the final rm command - Keep the stderr of the unmount of $subv1_mount Which should always be mounted. - Use "$UMOUNT_PROG" instead of direct "umount" - Use "_mount" instead of direct "mount" - Fix a typo of "block" --- tests/btrfs/325 | 99 +++++++++++++++++++++++++++++++++++++++++++++ tests/btrfs/325.out | 2 + 2 files changed, 101 insertions(+) create mode 100755 tests/btrfs/325 create mode 100644 tests/btrfs/325.out