Message ID | 20200520075746.16245-1-wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] btrfs: Add a test for dead looping balance after balance cancel | expand |
On 20/05/2020 09:57, Qu Wenruo wrote:
> The ifx is titled "btrfs: relocation: Clear the DEAD_RELOC_TREE bit for
Nit: s/ifx/fix/
On Wed, May 20, 2020 at 8:59 AM Qu Wenruo <wqu@suse.com> wrote: > > Test if canceling a running balance can cause later balance to dead > loop. > > The ifx is titled "btrfs: relocation: Clear the DEAD_RELOC_TREE bit for > orphan roots to prevent runaway balance". > > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > Changelog: > v2: > - Remove lsof debug output > v3: > - Remove ps debug output > --- > tests/btrfs/213 | 64 +++++++++++++++++++++++++++++++++++++++++++++ > tests/btrfs/213.out | 2 ++ > tests/btrfs/group | 1 + > 3 files changed, 67 insertions(+) > create mode 100755 tests/btrfs/213 > create mode 100644 tests/btrfs/213.out > > diff --git a/tests/btrfs/213 b/tests/btrfs/213 > new file mode 100755 > index 00000000..f56b0279 > --- /dev/null > +++ b/tests/btrfs/213 > @@ -0,0 +1,64 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (C) 2020 SUSE Linux Products GmbH. All Rights Reserved. > +# > +# FS QA Test 213 > +# > +# Test if canceling a running balance can lead to dead looping balance > +# > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +tmp=/tmp/$$ > +status=1 # failure is the default! > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +_cleanup() > +{ > + cd / > + rm -f $tmp.* > +} > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > + > +# remove previous $seqres.full before test > +rm -f $seqres.full > + > +# Modify as appropriate. > +_supported_fs btrfs > +_supported_os Linux > +_require_scratch > +_require_command "$KILLALL_PROG" killall > + > +_scratch_mkfs >> $seqres.full > +_scratch_mount > + > +runtime=4 > + > +# Create enough IO so that we need around $runtime seconds to relocate it > +dd if=/dev/zero bs=1M of="$SCRATCH_MNT/file" oflag=sync status=none \ > + &> /dev/null & > +dd_pid=$! > +sleep $runtime > +"$KILLALL_PROG" -q dd &> /dev/null Do we really need the killall program? There's only one dd process. We should also kill the dd process at _cleanup(), as killing the test during the sleep above will result in the dd process not being killed. > +wait $dd_pid > + > +# Now balance should take at least $runtime seconds, we can cancel it at > +# $runtime/2 to ensure a success cancel. > +$BTRFS_UTIL_PROG balance start -d --bg "$SCRATCH_MNT" > +sleep $(($runtime / 2)) > +$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT" > + > +# Now check if we can finish relocating metadata, which should finish very > +# quickly > +$BTRFS_UTIL_PROG balance start -m "$SCRATCH_MNT" >> $seqres.full Why redirect this one to $seqres.full and not the other balance? What kind of useful information it provides? > + > +echo "Silence is golden" > + > +# success, all done > +status=0 > +exit > diff --git a/tests/btrfs/213.out b/tests/btrfs/213.out > new file mode 100644 > index 00000000..bd8f2430 > --- /dev/null > +++ b/tests/btrfs/213.out > @@ -0,0 +1,2 @@ > +QA output created by 213 > +Silence is golden > diff --git a/tests/btrfs/group b/tests/btrfs/group > index 8d65bddd..fe4d5bb3 100644 > --- a/tests/btrfs/group > +++ b/tests/btrfs/group > @@ -215,3 +215,4 @@ > 210 auto quick qgroup snapshot > 211 auto quick log prealloc > 212 auto balance dangerous > +213 auto fast balance dangerous fast -> quick Thanks. > -- > 2.26.2 >
On 2020/5/20 下午5:29, Filipe Manana wrote: > On Wed, May 20, 2020 at 8:59 AM Qu Wenruo <wqu@suse.com> wrote: >> >> Test if canceling a running balance can cause later balance to dead >> loop. >> >> The ifx is titled "btrfs: relocation: Clear the DEAD_RELOC_TREE bit for >> orphan roots to prevent runaway balance". >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> Changelog: >> v2: >> - Remove lsof debug output >> v3: >> - Remove ps debug output >> --- >> tests/btrfs/213 | 64 +++++++++++++++++++++++++++++++++++++++++++++ >> tests/btrfs/213.out | 2 ++ >> tests/btrfs/group | 1 + >> 3 files changed, 67 insertions(+) >> create mode 100755 tests/btrfs/213 >> create mode 100644 tests/btrfs/213.out >> >> diff --git a/tests/btrfs/213 b/tests/btrfs/213 >> new file mode 100755 >> index 00000000..f56b0279 >> --- /dev/null >> +++ b/tests/btrfs/213 >> @@ -0,0 +1,64 @@ >> +#! /bin/bash >> +# SPDX-License-Identifier: GPL-2.0 >> +# Copyright (C) 2020 SUSE Linux Products GmbH. All Rights Reserved. >> +# >> +# FS QA Test 213 >> +# >> +# Test if canceling a running balance can lead to dead looping balance >> +# >> +seq=`basename $0` >> +seqres=$RESULT_DIR/$seq >> +echo "QA output created by $seq" >> + >> +here=`pwd` >> +tmp=/tmp/$$ >> +status=1 # failure is the default! >> +trap "_cleanup; exit \$status" 0 1 2 3 15 >> + >> +_cleanup() >> +{ >> + cd / >> + rm -f $tmp.* >> +} >> + >> +# get standard environment, filters and checks >> +. ./common/rc >> +. ./common/filter >> + >> +# remove previous $seqres.full before test >> +rm -f $seqres.full >> + >> +# Modify as appropriate. >> +_supported_fs btrfs >> +_supported_os Linux >> +_require_scratch >> +_require_command "$KILLALL_PROG" killall >> + >> +_scratch_mkfs >> $seqres.full >> +_scratch_mount >> + >> +runtime=4 >> + >> +# Create enough IO so that we need around $runtime seconds to relocate it >> +dd if=/dev/zero bs=1M of="$SCRATCH_MNT/file" oflag=sync status=none \ >> + &> /dev/null & >> +dd_pid=$! >> +sleep $runtime >> +"$KILLALL_PROG" -q dd &> /dev/null > > Do we really need the killall program? There's only one dd process. > > We should also kill the dd process at _cleanup(), as killing the test > during the sleep above will result in the dd process not being killed. The main problem here is, I can't find a good way to kill dd. plain 'kill $dd_pid' doesn't seem to kill it properly, as my debug lsof/ps still shows dd running and failed to unmount the fs. 'kill -KILL $dd_pid' kills it well, but causes extra output for the terminated dd. Only 'killall -q dd' works as expected. Any good advice on this? > >> +wait $dd_pid >> + >> +# Now balance should take at least $runtime seconds, we can cancel it at >> +# $runtime/2 to ensure a success cancel. >> +$BTRFS_UTIL_PROG balance start -d --bg "$SCRATCH_MNT" >> +sleep $(($runtime / 2)) >> +$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT" >> + >> +# Now check if we can finish relocating metadata, which should finish very >> +# quickly >> +$BTRFS_UTIL_PROG balance start -m "$SCRATCH_MNT" >> $seqres.full > > Why redirect this one to $seqres.full and not the other balance? What > kind of useful information it provides? Not really much, just an indicator to show that the balance finishes as expected. And we don't want to output it golden output, as mkfs change may create more metadata chunks and cause difference. For other ones, like the one to be canceled, we don't really care that much. Thanks, Qu > >> + >> +echo "Silence is golden" >> + >> +# success, all done >> +status=0 >> +exit >> diff --git a/tests/btrfs/213.out b/tests/btrfs/213.out >> new file mode 100644 >> index 00000000..bd8f2430 >> --- /dev/null >> +++ b/tests/btrfs/213.out >> @@ -0,0 +1,2 @@ >> +QA output created by 213 >> +Silence is golden >> diff --git a/tests/btrfs/group b/tests/btrfs/group >> index 8d65bddd..fe4d5bb3 100644 >> --- a/tests/btrfs/group >> +++ b/tests/btrfs/group >> @@ -215,3 +215,4 @@ >> 210 auto quick qgroup snapshot >> 211 auto quick log prealloc >> 212 auto balance dangerous >> +213 auto fast balance dangerous > > fast -> quick > > Thanks. > >> -- >> 2.26.2 >> > >
On 2020/5/20 下午6:02, Qu Wenruo wrote: > > > On 2020/5/20 下午5:29, Filipe Manana wrote: >> On Wed, May 20, 2020 at 8:59 AM Qu Wenruo <wqu@suse.com> wrote: >>> >>> Test if canceling a running balance can cause later balance to dead >>> loop. >>> >>> The ifx is titled "btrfs: relocation: Clear the DEAD_RELOC_TREE bit for >>> orphan roots to prevent runaway balance". >>> >>> Signed-off-by: Qu Wenruo <wqu@suse.com> >>> --- >>> Changelog: >>> v2: >>> - Remove lsof debug output >>> v3: >>> - Remove ps debug output >>> --- >>> tests/btrfs/213 | 64 +++++++++++++++++++++++++++++++++++++++++++++ >>> tests/btrfs/213.out | 2 ++ >>> tests/btrfs/group | 1 + >>> 3 files changed, 67 insertions(+) >>> create mode 100755 tests/btrfs/213 >>> create mode 100644 tests/btrfs/213.out >>> >>> diff --git a/tests/btrfs/213 b/tests/btrfs/213 >>> new file mode 100755 >>> index 00000000..f56b0279 >>> --- /dev/null >>> +++ b/tests/btrfs/213 >>> @@ -0,0 +1,64 @@ >>> +#! /bin/bash >>> +# SPDX-License-Identifier: GPL-2.0 >>> +# Copyright (C) 2020 SUSE Linux Products GmbH. All Rights Reserved. >>> +# >>> +# FS QA Test 213 >>> +# >>> +# Test if canceling a running balance can lead to dead looping balance >>> +# >>> +seq=`basename $0` >>> +seqres=$RESULT_DIR/$seq >>> +echo "QA output created by $seq" >>> + >>> +here=`pwd` >>> +tmp=/tmp/$$ >>> +status=1 # failure is the default! >>> +trap "_cleanup; exit \$status" 0 1 2 3 15 >>> + >>> +_cleanup() >>> +{ >>> + cd / >>> + rm -f $tmp.* >>> +} >>> + >>> +# get standard environment, filters and checks >>> +. ./common/rc >>> +. ./common/filter >>> + >>> +# remove previous $seqres.full before test >>> +rm -f $seqres.full >>> + >>> +# Modify as appropriate. >>> +_supported_fs btrfs >>> +_supported_os Linux >>> +_require_scratch >>> +_require_command "$KILLALL_PROG" killall >>> + >>> +_scratch_mkfs >> $seqres.full >>> +_scratch_mount >>> + >>> +runtime=4 >>> + >>> +# Create enough IO so that we need around $runtime seconds to relocate it >>> +dd if=/dev/zero bs=1M of="$SCRATCH_MNT/file" oflag=sync status=none \ >>> + &> /dev/null & >>> +dd_pid=$! >>> +sleep $runtime >>> +"$KILLALL_PROG" -q dd &> /dev/null >> >> Do we really need the killall program? There's only one dd process. >> >> We should also kill the dd process at _cleanup(), as killing the test >> during the sleep above will result in the dd process not being killed. > > The main problem here is, I can't find a good way to kill dd. > plain 'kill $dd_pid' doesn't seem to kill it properly, as my debug > lsof/ps still shows dd running and failed to unmount the fs. > > 'kill -KILL $dd_pid' kills it well, but causes extra output for the > terminated dd. > > Only 'killall -q dd' works as expected. > > Any good advice on this? Oh, the fstests "kindly" overrides dd, "kill $dd_pid" only kills the child bash running that dd function, not the real dd command. I'll use xfs_io to ensure we get no extra wrapper. Thanks, Qu > >> >>> +wait $dd_pid >>> + >>> +# Now balance should take at least $runtime seconds, we can cancel it at >>> +# $runtime/2 to ensure a success cancel. >>> +$BTRFS_UTIL_PROG balance start -d --bg "$SCRATCH_MNT" >>> +sleep $(($runtime / 2)) >>> +$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT" >>> + >>> +# Now check if we can finish relocating metadata, which should finish very >>> +# quickly >>> +$BTRFS_UTIL_PROG balance start -m "$SCRATCH_MNT" >> $seqres.full >> >> Why redirect this one to $seqres.full and not the other balance? What >> kind of useful information it provides? > > Not really much, just an indicator to show that the balance finishes as > expected. > And we don't want to output it golden output, as mkfs change may create > more metadata chunks and cause difference. > > For other ones, like the one to be canceled, we don't really care that much. > > Thanks, > Qu >> >>> + >>> +echo "Silence is golden" >>> + >>> +# success, all done >>> +status=0 >>> +exit >>> diff --git a/tests/btrfs/213.out b/tests/btrfs/213.out >>> new file mode 100644 >>> index 00000000..bd8f2430 >>> --- /dev/null >>> +++ b/tests/btrfs/213.out >>> @@ -0,0 +1,2 @@ >>> +QA output created by 213 >>> +Silence is golden >>> diff --git a/tests/btrfs/group b/tests/btrfs/group >>> index 8d65bddd..fe4d5bb3 100644 >>> --- a/tests/btrfs/group >>> +++ b/tests/btrfs/group >>> @@ -215,3 +215,4 @@ >>> 210 auto quick qgroup snapshot >>> 211 auto quick log prealloc >>> 212 auto balance dangerous >>> +213 auto fast balance dangerous >> >> fast -> quick >> >> Thanks. >> >>> -- >>> 2.26.2 >>> >> >> >
On Wed, May 20, 2020 at 12:32 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote: > > > > On 2020/5/20 下午6:02, Qu Wenruo wrote: > > > > > > On 2020/5/20 下午5:29, Filipe Manana wrote: > >> On Wed, May 20, 2020 at 8:59 AM Qu Wenruo <wqu@suse.com> wrote: > >>> > >>> Test if canceling a running balance can cause later balance to dead > >>> loop. > >>> > >>> The ifx is titled "btrfs: relocation: Clear the DEAD_RELOC_TREE bit for > >>> orphan roots to prevent runaway balance". > >>> > >>> Signed-off-by: Qu Wenruo <wqu@suse.com> > >>> --- > >>> Changelog: > >>> v2: > >>> - Remove lsof debug output > >>> v3: > >>> - Remove ps debug output > >>> --- > >>> tests/btrfs/213 | 64 +++++++++++++++++++++++++++++++++++++++++++++ > >>> tests/btrfs/213.out | 2 ++ > >>> tests/btrfs/group | 1 + > >>> 3 files changed, 67 insertions(+) > >>> create mode 100755 tests/btrfs/213 > >>> create mode 100644 tests/btrfs/213.out > >>> > >>> diff --git a/tests/btrfs/213 b/tests/btrfs/213 > >>> new file mode 100755 > >>> index 00000000..f56b0279 > >>> --- /dev/null > >>> +++ b/tests/btrfs/213 > >>> @@ -0,0 +1,64 @@ > >>> +#! /bin/bash > >>> +# SPDX-License-Identifier: GPL-2.0 > >>> +# Copyright (C) 2020 SUSE Linux Products GmbH. All Rights Reserved. > >>> +# > >>> +# FS QA Test 213 > >>> +# > >>> +# Test if canceling a running balance can lead to dead looping balance > >>> +# > >>> +seq=`basename $0` > >>> +seqres=$RESULT_DIR/$seq > >>> +echo "QA output created by $seq" > >>> + > >>> +here=`pwd` > >>> +tmp=/tmp/$$ > >>> +status=1 # failure is the default! > >>> +trap "_cleanup; exit \$status" 0 1 2 3 15 > >>> + > >>> +_cleanup() > >>> +{ > >>> + cd / > >>> + rm -f $tmp.* > >>> +} > >>> + > >>> +# get standard environment, filters and checks > >>> +. ./common/rc > >>> +. ./common/filter > >>> + > >>> +# remove previous $seqres.full before test > >>> +rm -f $seqres.full > >>> + > >>> +# Modify as appropriate. > >>> +_supported_fs btrfs > >>> +_supported_os Linux > >>> +_require_scratch > >>> +_require_command "$KILLALL_PROG" killall > >>> + > >>> +_scratch_mkfs >> $seqres.full > >>> +_scratch_mount > >>> + > >>> +runtime=4 > >>> + > >>> +# Create enough IO so that we need around $runtime seconds to relocate it > >>> +dd if=/dev/zero bs=1M of="$SCRATCH_MNT/file" oflag=sync status=none \ > >>> + &> /dev/null & > >>> +dd_pid=$! > >>> +sleep $runtime > >>> +"$KILLALL_PROG" -q dd &> /dev/null > >> > >> Do we really need the killall program? There's only one dd process. > >> > >> We should also kill the dd process at _cleanup(), as killing the test > >> during the sleep above will result in the dd process not being killed. > > > > The main problem here is, I can't find a good way to kill dd. > > plain 'kill $dd_pid' doesn't seem to kill it properly, as my debug > > lsof/ps still shows dd running and failed to unmount the fs. > > > > 'kill -KILL $dd_pid' kills it well, but causes extra output for the > > terminated dd. > > > > Only 'killall -q dd' works as expected. > > > > Any good advice on this? > > Oh, the fstests "kindly" overrides dd, "kill $dd_pid" only kills the > child bash running that dd function, not the real dd command. > I'll use xfs_io to ensure we get no extra wrapper. Indeed, I was doing this in the test: ps -eo pid,ppid,pgid,comm > /tmp/ps.txt my_pid=$$ echo "dd_pid = $dd_pid mypid = $my_pid" >> /tmp/ps.txt kill -9 $dd_pid wait $dd_pid And was noticing that the ps.txt file had: 23467 937 23467 check ... 23840 23467 23467 213 ... 24034 23840 23467 213 24039 24034 23467 dd ... dd_pid = 24034 mypid = 23840 So dd_pid matched a bash process running the test, which is a child of the bash process that invoked dd. Before digging further I saw your reply and checked that common/rc provides a function named dd() which is a wrapper to dd and it invokes dd twice, whence you needed killall. Interesting, wasn't aware of that wrapper. Thanks. > > Thanks, > Qu > > > > >> > >>> +wait $dd_pid > >>> + > >>> +# Now balance should take at least $runtime seconds, we can cancel it at > >>> +# $runtime/2 to ensure a success cancel. > >>> +$BTRFS_UTIL_PROG balance start -d --bg "$SCRATCH_MNT" > >>> +sleep $(($runtime / 2)) > >>> +$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT" > >>> + > >>> +# Now check if we can finish relocating metadata, which should finish very > >>> +# quickly > >>> +$BTRFS_UTIL_PROG balance start -m "$SCRATCH_MNT" >> $seqres.full > >> > >> Why redirect this one to $seqres.full and not the other balance? What > >> kind of useful information it provides? > > > > Not really much, just an indicator to show that the balance finishes as > > expected. > > And we don't want to output it golden output, as mkfs change may create > > more metadata chunks and cause difference. > > > > For other ones, like the one to be canceled, we don't really care that much. > > > > Thanks, > > Qu > >> > >>> + > >>> +echo "Silence is golden" > >>> + > >>> +# success, all done > >>> +status=0 > >>> +exit > >>> diff --git a/tests/btrfs/213.out b/tests/btrfs/213.out > >>> new file mode 100644 > >>> index 00000000..bd8f2430 > >>> --- /dev/null > >>> +++ b/tests/btrfs/213.out > >>> @@ -0,0 +1,2 @@ > >>> +QA output created by 213 > >>> +Silence is golden > >>> diff --git a/tests/btrfs/group b/tests/btrfs/group > >>> index 8d65bddd..fe4d5bb3 100644 > >>> --- a/tests/btrfs/group > >>> +++ b/tests/btrfs/group > >>> @@ -215,3 +215,4 @@ > >>> 210 auto quick qgroup snapshot > >>> 211 auto quick log prealloc > >>> 212 auto balance dangerous > >>> +213 auto fast balance dangerous > >> > >> fast -> quick > >> > >> Thanks. > >> > >>> -- > >>> 2.26.2 > >>> > >> > >> > > >
diff --git a/tests/btrfs/213 b/tests/btrfs/213 new file mode 100755 index 00000000..f56b0279 --- /dev/null +++ b/tests/btrfs/213 @@ -0,0 +1,64 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2020 SUSE Linux Products GmbH. All Rights Reserved. +# +# FS QA Test 213 +# +# Test if canceling a running balance can lead to dead looping balance +# +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +tmp=/tmp/$$ +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# remove previous $seqres.full before test +rm -f $seqres.full + +# Modify as appropriate. +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_command "$KILLALL_PROG" killall + +_scratch_mkfs >> $seqres.full +_scratch_mount + +runtime=4 + +# Create enough IO so that we need around $runtime seconds to relocate it +dd if=/dev/zero bs=1M of="$SCRATCH_MNT/file" oflag=sync status=none \ + &> /dev/null & +dd_pid=$! +sleep $runtime +"$KILLALL_PROG" -q dd &> /dev/null +wait $dd_pid + +# Now balance should take at least $runtime seconds, we can cancel it at +# $runtime/2 to ensure a success cancel. +$BTRFS_UTIL_PROG balance start -d --bg "$SCRATCH_MNT" +sleep $(($runtime / 2)) +$BTRFS_UTIL_PROG balance cancel "$SCRATCH_MNT" + +# Now check if we can finish relocating metadata, which should finish very +# quickly +$BTRFS_UTIL_PROG balance start -m "$SCRATCH_MNT" >> $seqres.full + +echo "Silence is golden" + +# success, all done +status=0 +exit diff --git a/tests/btrfs/213.out b/tests/btrfs/213.out new file mode 100644 index 00000000..bd8f2430 --- /dev/null +++ b/tests/btrfs/213.out @@ -0,0 +1,2 @@ +QA output created by 213 +Silence is golden diff --git a/tests/btrfs/group b/tests/btrfs/group index 8d65bddd..fe4d5bb3 100644 --- a/tests/btrfs/group +++ b/tests/btrfs/group @@ -215,3 +215,4 @@ 210 auto quick qgroup snapshot 211 auto quick log prealloc 212 auto balance dangerous +213 auto fast balance dangerous
Test if canceling a running balance can cause later balance to dead loop. The ifx is titled "btrfs: relocation: Clear the DEAD_RELOC_TREE bit for orphan roots to prevent runaway balance". Signed-off-by: Qu Wenruo <wqu@suse.com> --- Changelog: v2: - Remove lsof debug output v3: - Remove ps debug output --- tests/btrfs/213 | 64 +++++++++++++++++++++++++++++++++++++++++++++ tests/btrfs/213.out | 2 ++ tests/btrfs/group | 1 + 3 files changed, 67 insertions(+) create mode 100755 tests/btrfs/213 create mode 100644 tests/btrfs/213.out