Message ID | 20221026165747.1146281-1-zlang@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | xfs: new test to ensure xfs can capture IO errors correctly | expand |
On Thu, Oct 27, 2022 at 12:57:47AM +0800, Zorro Lang wrote: > There was a known xfs crash bug fixed by e001873853d8 ("xfs: ensure > we capture IO errors correctly"), so trys to cover this bug and make > sure xfs can capture IO errors correctly, won't panic and hang again. > > Signed-off-by: Zorro Lang <zlang@redhat.com> > --- > > Hi, > > When I tried to tidy up our internal test cases recently, I found a very > old case which trys to cover e001873853d8 ("xfs: ensure we capture IO errors > correctly") which fix by Dave. At that time, we didn't support xfs injection, > so we tested it by a systemtap script [1] to inject an ioerror. > > Now this bug has been fixed long long time ago (9+ years), and that stap script > is already out of date, can't work with new kernel. But good news is we have xfs > injection now, so I try to resume this test case in fstests. > > I didn't verify if this case can reproduce that bug on old rhel (which doesn't > support error injection). The original case tried to inject errno 11, I'm > not sure if it's worth trying more other errors. I searched "buf_ioerror" in > fstests, found nothing. So maybe this bug is old enough, but it's worth covering > this kind of test. So feel free to tell me if you have any suggestions :) > > Thanks, > Zorro > > [1] > probe module("xfs").function("xfs_buf_bio_end_io") > { > if ($error == 0) { > if ($bio->bi_rw & (1 << 4)) { > $error = -11; > printf("%s: comm %s, pid %d, setting error 11\n", > probefunc(), execname(), pid()); > print_stack(backtrace()); > } > } > } > > tests/xfs/554 | 53 +++++++++++++++++++++++++++++++++++++++++++++++ > tests/xfs/554.out | 4 ++++ > 2 files changed, 57 insertions(+) > create mode 100755 tests/xfs/554 > create mode 100644 tests/xfs/554.out > > diff --git a/tests/xfs/554 b/tests/xfs/554 > new file mode 100755 > index 00000000..6935bfc0 > --- /dev/null > +++ b/tests/xfs/554 > @@ -0,0 +1,53 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (c) 2022 YOUR NAME HERE. All Rights Reserved. Mr. YOUR HERE, Please write your real name in the copyright statement. > +# > +# FS QA Test 554 > +# > +# There was a known xfs crash bug fixed by e001873853d8 ("xfs: ensure we > +# capture IO errors correctly"), so trys to cover this bug and make sure > +# xfs can capture IO errors correctly, won't panic and hang again. > +# > +. ./common/preamble > +_begin_fstest auto eio > + > +_cleanup() > +{ > + $KILLALL_PROG -q fsstress 2> /dev/null > + # ensures all fsstress processes died > + wait > + # log replay, due to the buf_ioerror injection might leave dirty log > + _scratch_cycle_mount > + cd / > + rm -r -f $tmp.* > +} > + > +# Import common functions. > +. ./common/inject > + > +# real QA test starts here > +_supported_fs xfs > +_require_command "$KILLALL_PROG" "killall" > +_require_scratch > +_require_xfs_debug > +_require_xfs_io_error_injection "buf_ioerror" > + > +_scratch_mkfs >> $seqres.full > +_scratch_mount > + > +echo "Inject buf ioerror tag" > +_scratch_inject_error buf_ioerror 11 > + > +echo "Random I/Os testing ..." > +$FSSTRESS_PROG $FSSTRESS_AVOID -d $SCRATCH_MNT -n 50000 -p 100 >> $seqres.full & > +for ((i=0; i<5; i++));do > + # Clear caches, then trys to use 'find' to trigger readahead BUF_IOERROR only seems to apply to async writes: static void xfs_buf_bio_end_io( struct bio *bio) { struct xfs_buf *bp = (struct xfs_buf *)bio->bi_private; if (!bio->bi_status && (bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC) && XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR)) bio->bi_status = BLK_STS_IOERR; So I don't see how this would reproduce the problem of b_error not being cleared after a failed readahead and re-read? --D > + echo 3 > /proc/sys/vm/drop_caches > + find $SCRATCH_MNT >/dev/null 2>&1 > + sleep 3 > +done > + > +echo "No hang or panic" > +# success, all done > +status=0 > +exit > diff --git a/tests/xfs/554.out b/tests/xfs/554.out > new file mode 100644 > index 00000000..26910daa > --- /dev/null > +++ b/tests/xfs/554.out > @@ -0,0 +1,4 @@ > +QA output created by 554 > +Inject buf ioerror tag > +Random I/Os testing ... > +No hang or panic > -- > 2.31.1 >
On Wed, Oct 26, 2022 at 11:30:29AM -0700, Darrick J. Wong wrote: > On Thu, Oct 27, 2022 at 12:57:47AM +0800, Zorro Lang wrote: > > There was a known xfs crash bug fixed by e001873853d8 ("xfs: ensure > > we capture IO errors correctly"), so trys to cover this bug and make > > sure xfs can capture IO errors correctly, won't panic and hang again. > > > > Signed-off-by: Zorro Lang <zlang@redhat.com> > > --- > > > > Hi, > > > > When I tried to tidy up our internal test cases recently, I found a very > > old case which trys to cover e001873853d8 ("xfs: ensure we capture IO errors > > correctly") which fix by Dave. At that time, we didn't support xfs injection, > > so we tested it by a systemtap script [1] to inject an ioerror. > > > > Now this bug has been fixed long long time ago (9+ years), and that stap script > > is already out of date, can't work with new kernel. But good news is we have xfs > > injection now, so I try to resume this test case in fstests. > > > > I didn't verify if this case can reproduce that bug on old rhel (which doesn't > > support error injection). The original case tried to inject errno 11, I'm > > not sure if it's worth trying more other errors. I searched "buf_ioerror" in > > fstests, found nothing. So maybe this bug is old enough, but it's worth covering > > this kind of test. So feel free to tell me if you have any suggestions :) > > > > Thanks, > > Zorro > > > > [1] > > probe module("xfs").function("xfs_buf_bio_end_io") > > { > > if ($error == 0) { > > if ($bio->bi_rw & (1 << 4)) { > > $error = -11; > > printf("%s: comm %s, pid %d, setting error 11\n", > > probefunc(), execname(), pid()); > > print_stack(backtrace()); > > } > > } > > } > > > > tests/xfs/554 | 53 +++++++++++++++++++++++++++++++++++++++++++++++ > > tests/xfs/554.out | 4 ++++ > > 2 files changed, 57 insertions(+) > > create mode 100755 tests/xfs/554 > > create mode 100644 tests/xfs/554.out > > > > diff --git a/tests/xfs/554 b/tests/xfs/554 > > new file mode 100755 > > index 00000000..6935bfc0 > > --- /dev/null > > +++ b/tests/xfs/554 > > @@ -0,0 +1,53 @@ > > +#! /bin/bash > > +# SPDX-License-Identifier: GPL-2.0 > > +# Copyright (c) 2022 YOUR NAME HERE. All Rights Reserved. > > Mr. YOUR HERE, > > Please write your real name in the copyright statement. > > > +# > > +# FS QA Test 554 > > +# > > +# There was a known xfs crash bug fixed by e001873853d8 ("xfs: ensure we > > +# capture IO errors correctly"), so trys to cover this bug and make sure > > +# xfs can capture IO errors correctly, won't panic and hang again. > > +# > > +. ./common/preamble > > +_begin_fstest auto eio > > + > > +_cleanup() > > +{ > > + $KILLALL_PROG -q fsstress 2> /dev/null > > + # ensures all fsstress processes died > > + wait > > + # log replay, due to the buf_ioerror injection might leave dirty log > > + _scratch_cycle_mount > > + cd / > > + rm -r -f $tmp.* > > +} > > + > > +# Import common functions. > > +. ./common/inject > > + > > +# real QA test starts here > > +_supported_fs xfs > > +_require_command "$KILLALL_PROG" "killall" > > +_require_scratch > > +_require_xfs_debug > > +_require_xfs_io_error_injection "buf_ioerror" > > + > > +_scratch_mkfs >> $seqres.full > > +_scratch_mount > > + > > +echo "Inject buf ioerror tag" > > +_scratch_inject_error buf_ioerror 11 > > + > > +echo "Random I/Os testing ..." > > +$FSSTRESS_PROG $FSSTRESS_AVOID -d $SCRATCH_MNT -n 50000 -p 100 >> $seqres.full & > > +for ((i=0; i<5; i++));do > > + # Clear caches, then trys to use 'find' to trigger readahead > > BUF_IOERROR only seems to apply to async writes: > > static void > xfs_buf_bio_end_io( > struct bio *bio) > { > struct xfs_buf *bp = (struct xfs_buf *)bio->bi_private; > > if (!bio->bi_status && > (bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC) && > XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR)) > bio->bi_status = BLK_STS_IOERR; > > So I don't see how this would reproduce the problem of b_error not being > cleared after a failed readahead and re-read? Oh, "bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC)" ... so I don't have chance to cover this bug? I have to abandon this patch, or we'd like to change it to be a general async ioerror injection test. Thanks, Zorro > > --D > > > + echo 3 > /proc/sys/vm/drop_caches > > + find $SCRATCH_MNT >/dev/null 2>&1 > > + sleep 3 > > +done > > + > > +echo "No hang or panic" > > +# success, all done > > +status=0 > > +exit > > diff --git a/tests/xfs/554.out b/tests/xfs/554.out > > new file mode 100644 > > index 00000000..26910daa > > --- /dev/null > > +++ b/tests/xfs/554.out > > @@ -0,0 +1,4 @@ > > +QA output created by 554 > > +Inject buf ioerror tag > > +Random I/Os testing ... > > +No hang or panic > > -- > > 2.31.1 > > >
On Thu, Oct 27, 2022 at 10:24:59AM +0800, Zorro Lang wrote: > On Wed, Oct 26, 2022 at 11:30:29AM -0700, Darrick J. Wong wrote: > > On Thu, Oct 27, 2022 at 12:57:47AM +0800, Zorro Lang wrote: > > > There was a known xfs crash bug fixed by e001873853d8 ("xfs: ensure > > > we capture IO errors correctly"), so trys to cover this bug and make > > > sure xfs can capture IO errors correctly, won't panic and hang again. > > > > > > Signed-off-by: Zorro Lang <zlang@redhat.com> > > > --- > > > > > > Hi, > > > > > > When I tried to tidy up our internal test cases recently, I found a very > > > old case which trys to cover e001873853d8 ("xfs: ensure we capture IO errors > > > correctly") which fix by Dave. At that time, we didn't support xfs injection, > > > so we tested it by a systemtap script [1] to inject an ioerror. > > > > > > Now this bug has been fixed long long time ago (9+ years), and that stap script > > > is already out of date, can't work with new kernel. But good news is we have xfs > > > injection now, so I try to resume this test case in fstests. > > > > > > I didn't verify if this case can reproduce that bug on old rhel (which doesn't > > > support error injection). The original case tried to inject errno 11, I'm > > > not sure if it's worth trying more other errors. I searched "buf_ioerror" in > > > fstests, found nothing. So maybe this bug is old enough, but it's worth covering > > > this kind of test. So feel free to tell me if you have any suggestions :) > > > > > > Thanks, > > > Zorro > > > > > > [1] > > > probe module("xfs").function("xfs_buf_bio_end_io") > > > { > > > if ($error == 0) { > > > if ($bio->bi_rw & (1 << 4)) { > > > $error = -11; > > > printf("%s: comm %s, pid %d, setting error 11\n", > > > probefunc(), execname(), pid()); > > > print_stack(backtrace()); > > > } > > > } > > > } > > > > > > tests/xfs/554 | 53 +++++++++++++++++++++++++++++++++++++++++++++++ > > > tests/xfs/554.out | 4 ++++ > > > 2 files changed, 57 insertions(+) > > > create mode 100755 tests/xfs/554 > > > create mode 100644 tests/xfs/554.out > > > > > > diff --git a/tests/xfs/554 b/tests/xfs/554 > > > new file mode 100755 > > > index 00000000..6935bfc0 > > > --- /dev/null > > > +++ b/tests/xfs/554 > > > @@ -0,0 +1,53 @@ > > > +#! /bin/bash > > > +# SPDX-License-Identifier: GPL-2.0 > > > +# Copyright (c) 2022 YOUR NAME HERE. All Rights Reserved. > > > > Mr. YOUR HERE, > > > > Please write your real name in the copyright statement. > > > > > +# > > > +# FS QA Test 554 > > > +# > > > +# There was a known xfs crash bug fixed by e001873853d8 ("xfs: ensure we > > > +# capture IO errors correctly"), so trys to cover this bug and make sure > > > +# xfs can capture IO errors correctly, won't panic and hang again. > > > +# > > > +. ./common/preamble > > > +_begin_fstest auto eio > > > + > > > +_cleanup() > > > +{ > > > + $KILLALL_PROG -q fsstress 2> /dev/null > > > + # ensures all fsstress processes died > > > + wait > > > + # log replay, due to the buf_ioerror injection might leave dirty log > > > + _scratch_cycle_mount > > > + cd / > > > + rm -r -f $tmp.* > > > +} > > > + > > > +# Import common functions. > > > +. ./common/inject > > > + > > > +# real QA test starts here > > > +_supported_fs xfs > > > +_require_command "$KILLALL_PROG" "killall" > > > +_require_scratch > > > +_require_xfs_debug > > > +_require_xfs_io_error_injection "buf_ioerror" > > > + > > > +_scratch_mkfs >> $seqres.full > > > +_scratch_mount > > > + > > > +echo "Inject buf ioerror tag" > > > +_scratch_inject_error buf_ioerror 11 > > > + > > > +echo "Random I/Os testing ..." > > > +$FSSTRESS_PROG $FSSTRESS_AVOID -d $SCRATCH_MNT -n 50000 -p 100 >> $seqres.full & > > > +for ((i=0; i<5; i++));do > > > + # Clear caches, then trys to use 'find' to trigger readahead > > > > BUF_IOERROR only seems to apply to async writes: > > > > static void > > xfs_buf_bio_end_io( > > struct bio *bio) > > { > > struct xfs_buf *bp = (struct xfs_buf *)bio->bi_private; > > > > if (!bio->bi_status && > > (bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC) && > > XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR)) > > bio->bi_status = BLK_STS_IOERR; > > > > So I don't see how this would reproduce the problem of b_error not being > > cleared after a failed readahead and re-read? > > Oh, "bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC)" ... so I don't > have chance to cover this bug? I have to abandon this patch, or we'd like to > change it to be a general async ioerror injection test. Well you /could/ add a new knob to make readahead fail, that's probably an interesting case that doesn't get tested much. --D > Thanks, > Zorro > > > > > --D > > > > > + echo 3 > /proc/sys/vm/drop_caches > > > + find $SCRATCH_MNT >/dev/null 2>&1 > > > + sleep 3 > > > +done > > > + > > > +echo "No hang or panic" > > > +# success, all done > > > +status=0 > > > +exit > > > diff --git a/tests/xfs/554.out b/tests/xfs/554.out > > > new file mode 100644 > > > index 00000000..26910daa > > > --- /dev/null > > > +++ b/tests/xfs/554.out > > > @@ -0,0 +1,4 @@ > > > +QA output created by 554 > > > +Inject buf ioerror tag > > > +Random I/Os testing ... > > > +No hang or panic > > > -- > > > 2.31.1 > > > > > >
diff --git a/tests/xfs/554 b/tests/xfs/554 new file mode 100755 index 00000000..6935bfc0 --- /dev/null +++ b/tests/xfs/554 @@ -0,0 +1,53 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2022 YOUR NAME HERE. All Rights Reserved. +# +# FS QA Test 554 +# +# There was a known xfs crash bug fixed by e001873853d8 ("xfs: ensure we +# capture IO errors correctly"), so trys to cover this bug and make sure +# xfs can capture IO errors correctly, won't panic and hang again. +# +. ./common/preamble +_begin_fstest auto eio + +_cleanup() +{ + $KILLALL_PROG -q fsstress 2> /dev/null + # ensures all fsstress processes died + wait + # log replay, due to the buf_ioerror injection might leave dirty log + _scratch_cycle_mount + cd / + rm -r -f $tmp.* +} + +# Import common functions. +. ./common/inject + +# real QA test starts here +_supported_fs xfs +_require_command "$KILLALL_PROG" "killall" +_require_scratch +_require_xfs_debug +_require_xfs_io_error_injection "buf_ioerror" + +_scratch_mkfs >> $seqres.full +_scratch_mount + +echo "Inject buf ioerror tag" +_scratch_inject_error buf_ioerror 11 + +echo "Random I/Os testing ..." +$FSSTRESS_PROG $FSSTRESS_AVOID -d $SCRATCH_MNT -n 50000 -p 100 >> $seqres.full & +for ((i=0; i<5; i++));do + # Clear caches, then trys to use 'find' to trigger readahead + echo 3 > /proc/sys/vm/drop_caches + find $SCRATCH_MNT >/dev/null 2>&1 + sleep 3 +done + +echo "No hang or panic" +# success, all done +status=0 +exit diff --git a/tests/xfs/554.out b/tests/xfs/554.out new file mode 100644 index 00000000..26910daa --- /dev/null +++ b/tests/xfs/554.out @@ -0,0 +1,4 @@ +QA output created by 554 +Inject buf ioerror tag +Random I/Os testing ... +No hang or panic
There was a known xfs crash bug fixed by e001873853d8 ("xfs: ensure we capture IO errors correctly"), so trys to cover this bug and make sure xfs can capture IO errors correctly, won't panic and hang again. Signed-off-by: Zorro Lang <zlang@redhat.com> --- Hi, When I tried to tidy up our internal test cases recently, I found a very old case which trys to cover e001873853d8 ("xfs: ensure we capture IO errors correctly") which fix by Dave. At that time, we didn't support xfs injection, so we tested it by a systemtap script [1] to inject an ioerror. Now this bug has been fixed long long time ago (9+ years), and that stap script is already out of date, can't work with new kernel. But good news is we have xfs injection now, so I try to resume this test case in fstests. I didn't verify if this case can reproduce that bug on old rhel (which doesn't support error injection). The original case tried to inject errno 11, I'm not sure if it's worth trying more other errors. I searched "buf_ioerror" in fstests, found nothing. So maybe this bug is old enough, but it's worth covering this kind of test. So feel free to tell me if you have any suggestions :) Thanks, Zorro [1] probe module("xfs").function("xfs_buf_bio_end_io") { if ($error == 0) { if ($bio->bi_rw & (1 << 4)) { $error = -11; printf("%s: comm %s, pid %d, setting error 11\n", probefunc(), execname(), pid()); print_stack(backtrace()); } } } tests/xfs/554 | 53 +++++++++++++++++++++++++++++++++++++++++++++++ tests/xfs/554.out | 4 ++++ 2 files changed, 57 insertions(+) create mode 100755 tests/xfs/554 create mode 100644 tests/xfs/554.out