xfs: new test to ensure xfs can capture IO errors correctly

Message ID	20221026165747.1146281-1-zlang@kernel.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <fstests-owner@kernel.org> From: Zorro Lang <zlang@kernel.org> To: fstests@vger.kernel.org Cc: linux-xfs@vger.kernel.org Subject: [PATCH] xfs: new test to ensure xfs can capture IO errors correctly Date: Thu, 27 Oct 2022 00:57:47 +0800 Message-Id: <20221026165747.1146281-1-zlang@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	xfs: new test to ensure xfs can capture IO errors correctly \| expand xfs: new test to ensure xfs can capture IO errors correctly

Message ID

20221026165747.1146281-1-zlang@kernel.org (mailing list archive)

State

New, archived

Headers

From: Zorro Lang <zlang@kernel.org>
To: fstests@vger.kernel.org
Cc: linux-xfs@vger.kernel.org
Subject: [PATCH] xfs: new test to ensure xfs can capture IO errors correctly
Date: Thu, 27 Oct 2022 00:57:47 +0800
Message-Id: <20221026165747.1146281-1-zlang@kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

xfs: new test to ensure xfs can capture IO errors correctly | expand

Commit Message

Zorro Lang Oct. 26, 2022, 4:57 p.m. UTC

There was a known xfs crash bug fixed by e001873853d8 ("xfs: ensure
we capture IO errors correctly"), so trys to cover this bug and make
sure xfs can capture IO errors correctly, won't panic and hang again.

Signed-off-by: Zorro Lang <zlang@redhat.com>
---

Hi,

When I tried to tidy up our internal test cases recently, I found a very
old case which trys to cover e001873853d8 ("xfs: ensure we capture IO errors
correctly") which fix by Dave. At that time, we didn't support xfs injection,
so we tested it by a systemtap script [1] to inject an ioerror.

Now this bug has been fixed long long time ago (9+ years), and that stap script
is already out of date, can't work with new kernel. But good news is we have xfs
injection now, so I try to resume this test case in fstests.

I didn't verify if this case can reproduce that bug on old rhel (which doesn't
support error injection). The original case tried to inject errno 11, I'm
not sure if it's worth trying more other errors. I searched "buf_ioerror" in
fstests, found nothing. So maybe this bug is old enough, but it's worth covering
this kind of test. So feel free to tell me if you have any suggestions :)

Thanks,
Zorro

[1]
probe module("xfs").function("xfs_buf_bio_end_io")
{
        if ($error == 0) {
                if ($bio->bi_rw & (1 << 4)) {
                        $error = -11;
                        printf("%s: comm %s, pid %d, setting error 11\n",
                                probefunc(), execname(), pid());
                        print_stack(backtrace());
                }
        }
}

 tests/xfs/554     | 53 +++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/554.out |  4 ++++
 2 files changed, 57 insertions(+)
 create mode 100755 tests/xfs/554
 create mode 100644 tests/xfs/554.out

Comments

Darrick J. Wong Oct. 26, 2022, 6:30 p.m. UTC | #1

On Thu, Oct 27, 2022 at 12:57:47AM +0800, Zorro Lang wrote:
> There was a known xfs crash bug fixed by e001873853d8 ("xfs: ensure
> we capture IO errors correctly"), so trys to cover this bug and make
> sure xfs can capture IO errors correctly, won't panic and hang again.
> 
> Signed-off-by: Zorro Lang <zlang@redhat.com>
> ---
> 
> Hi,
> 
> When I tried to tidy up our internal test cases recently, I found a very
> old case which trys to cover e001873853d8 ("xfs: ensure we capture IO errors
> correctly") which fix by Dave. At that time, we didn't support xfs injection,
> so we tested it by a systemtap script [1] to inject an ioerror.
> 
> Now this bug has been fixed long long time ago (9+ years), and that stap script
> is already out of date, can't work with new kernel. But good news is we have xfs
> injection now, so I try to resume this test case in fstests.
> 
> I didn't verify if this case can reproduce that bug on old rhel (which doesn't
> support error injection). The original case tried to inject errno 11, I'm
> not sure if it's worth trying more other errors. I searched "buf_ioerror" in
> fstests, found nothing. So maybe this bug is old enough, but it's worth covering
> this kind of test. So feel free to tell me if you have any suggestions :)
> 
> Thanks,
> Zorro
> 
> [1]
> probe module("xfs").function("xfs_buf_bio_end_io")
> {
>         if ($error == 0) {
>                 if ($bio->bi_rw & (1 << 4)) {
>                         $error = -11;
>                         printf("%s: comm %s, pid %d, setting error 11\n",
>                                 probefunc(), execname(), pid());
>                         print_stack(backtrace());
>                 }
>         }
> }
> 
>  tests/xfs/554     | 53 +++++++++++++++++++++++++++++++++++++++++++++++
>  tests/xfs/554.out |  4 ++++
>  2 files changed, 57 insertions(+)
>  create mode 100755 tests/xfs/554
>  create mode 100644 tests/xfs/554.out
> 
> diff --git a/tests/xfs/554 b/tests/xfs/554
> new file mode 100755
> index 00000000..6935bfc0
> --- /dev/null
> +++ b/tests/xfs/554
> @@ -0,0 +1,53 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2022 YOUR NAME HERE.  All Rights Reserved.

Mr. YOUR HERE,

Please write your real name in the copyright statement.

> +#
> +# FS QA Test 554
> +#
> +# There was a known xfs crash bug fixed by e001873853d8 ("xfs: ensure we
> +# capture IO errors correctly"), so trys to cover this bug and make sure
> +# xfs can capture IO errors correctly, won't panic and hang again.
> +#
> +. ./common/preamble
> +_begin_fstest auto eio
> +
> +_cleanup()
> +{
> +	$KILLALL_PROG -q fsstress 2> /dev/null
> +	# ensures all fsstress processes died
> +	wait
> +	# log replay, due to the buf_ioerror injection might leave dirty log
> +	_scratch_cycle_mount
> +	cd /
> +	rm -r -f $tmp.*
> +}
> +
> +# Import common functions.
> +. ./common/inject
> +
> +# real QA test starts here
> +_supported_fs xfs
> +_require_command "$KILLALL_PROG" "killall"
> +_require_scratch
> +_require_xfs_debug
> +_require_xfs_io_error_injection "buf_ioerror"
> +
> +_scratch_mkfs >> $seqres.full
> +_scratch_mount
> +
> +echo "Inject buf ioerror tag"
> +_scratch_inject_error buf_ioerror 11
> +
> +echo "Random I/Os testing ..."
> +$FSSTRESS_PROG $FSSTRESS_AVOID -d $SCRATCH_MNT -n 50000 -p 100 >> $seqres.full &
> +for ((i=0; i<5; i++));do
> +	# Clear caches, then trys to use 'find' to trigger readahead

BUF_IOERROR only seems to apply to async writes:

static void
xfs_buf_bio_end_io(
	struct bio		*bio)
{
	struct xfs_buf		*bp = (struct xfs_buf *)bio->bi_private;

	if (!bio->bi_status &&
	    (bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC) &&
	    XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR))
		bio->bi_status = BLK_STS_IOERR;

So I don't see how this would reproduce the problem of b_error not being
cleared after a failed readahead and re-read?

--D

> +	echo 3 > /proc/sys/vm/drop_caches
> +	find $SCRATCH_MNT >/dev/null 2>&1
> +	sleep 3
> +done
> +
> +echo "No hang or panic"
> +# success, all done
> +status=0
> +exit
> diff --git a/tests/xfs/554.out b/tests/xfs/554.out
> new file mode 100644
> index 00000000..26910daa
> --- /dev/null
> +++ b/tests/xfs/554.out
> @@ -0,0 +1,4 @@
> +QA output created by 554
> +Inject buf ioerror tag
> +Random I/Os testing ...
> +No hang or panic
> -- 
> 2.31.1
>

Zorro Lang Oct. 27, 2022, 2:24 a.m. UTC | #2

On Wed, Oct 26, 2022 at 11:30:29AM -0700, Darrick J. Wong wrote:
> On Thu, Oct 27, 2022 at 12:57:47AM +0800, Zorro Lang wrote:
> > There was a known xfs crash bug fixed by e001873853d8 ("xfs: ensure
> > we capture IO errors correctly"), so trys to cover this bug and make
> > sure xfs can capture IO errors correctly, won't panic and hang again.
> > 
> > Signed-off-by: Zorro Lang <zlang@redhat.com>
> > ---
> > 
> > Hi,
> > 
> > When I tried to tidy up our internal test cases recently, I found a very
> > old case which trys to cover e001873853d8 ("xfs: ensure we capture IO errors
> > correctly") which fix by Dave. At that time, we didn't support xfs injection,
> > so we tested it by a systemtap script [1] to inject an ioerror.
> > 
> > Now this bug has been fixed long long time ago (9+ years), and that stap script
> > is already out of date, can't work with new kernel. But good news is we have xfs
> > injection now, so I try to resume this test case in fstests.
> > 
> > I didn't verify if this case can reproduce that bug on old rhel (which doesn't
> > support error injection). The original case tried to inject errno 11, I'm
> > not sure if it's worth trying more other errors. I searched "buf_ioerror" in
> > fstests, found nothing. So maybe this bug is old enough, but it's worth covering
> > this kind of test. So feel free to tell me if you have any suggestions :)
> > 
> > Thanks,
> > Zorro
> > 
> > [1]
> > probe module("xfs").function("xfs_buf_bio_end_io")
> > {
> >         if ($error == 0) {
> >                 if ($bio->bi_rw & (1 << 4)) {
> >                         $error = -11;
> >                         printf("%s: comm %s, pid %d, setting error 11\n",
> >                                 probefunc(), execname(), pid());
> >                         print_stack(backtrace());
> >                 }
> >         }
> > }
> > 
> >  tests/xfs/554     | 53 +++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/xfs/554.out |  4 ++++
> >  2 files changed, 57 insertions(+)
> >  create mode 100755 tests/xfs/554
> >  create mode 100644 tests/xfs/554.out
> > 
> > diff --git a/tests/xfs/554 b/tests/xfs/554
> > new file mode 100755
> > index 00000000..6935bfc0
> > --- /dev/null
> > +++ b/tests/xfs/554
> > @@ -0,0 +1,53 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2022 YOUR NAME HERE.  All Rights Reserved.
> 
> Mr. YOUR HERE,
> 
> Please write your real name in the copyright statement.
> 
> > +#
> > +# FS QA Test 554
> > +#
> > +# There was a known xfs crash bug fixed by e001873853d8 ("xfs: ensure we
> > +# capture IO errors correctly"), so trys to cover this bug and make sure
> > +# xfs can capture IO errors correctly, won't panic and hang again.
> > +#
> > +. ./common/preamble
> > +_begin_fstest auto eio
> > +
> > +_cleanup()
> > +{
> > +	$KILLALL_PROG -q fsstress 2> /dev/null
> > +	# ensures all fsstress processes died
> > +	wait
> > +	# log replay, due to the buf_ioerror injection might leave dirty log
> > +	_scratch_cycle_mount
> > +	cd /
> > +	rm -r -f $tmp.*
> > +}
> > +
> > +# Import common functions.
> > +. ./common/inject
> > +
> > +# real QA test starts here
> > +_supported_fs xfs
> > +_require_command "$KILLALL_PROG" "killall"
> > +_require_scratch
> > +_require_xfs_debug
> > +_require_xfs_io_error_injection "buf_ioerror"
> > +
> > +_scratch_mkfs >> $seqres.full
> > +_scratch_mount
> > +
> > +echo "Inject buf ioerror tag"
> > +_scratch_inject_error buf_ioerror 11
> > +
> > +echo "Random I/Os testing ..."
> > +$FSSTRESS_PROG $FSSTRESS_AVOID -d $SCRATCH_MNT -n 50000 -p 100 >> $seqres.full &
> > +for ((i=0; i<5; i++));do
> > +	# Clear caches, then trys to use 'find' to trigger readahead
> 
> BUF_IOERROR only seems to apply to async writes:
> 
> static void
> xfs_buf_bio_end_io(
> 	struct bio		*bio)
> {
> 	struct xfs_buf		*bp = (struct xfs_buf *)bio->bi_private;
> 
> 	if (!bio->bi_status &&
> 	    (bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC) &&
> 	    XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR))
> 		bio->bi_status = BLK_STS_IOERR;
> 
> So I don't see how this would reproduce the problem of b_error not being
> cleared after a failed readahead and re-read?

Oh, "bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC)" ... so I don't
have chance to cover this bug? I have to abandon this patch, or we'd like to
change it to be a general async ioerror injection test.

Thanks,
Zorro

> 
> --D
> 
> > +	echo 3 > /proc/sys/vm/drop_caches
> > +	find $SCRATCH_MNT >/dev/null 2>&1
> > +	sleep 3
> > +done
> > +
> > +echo "No hang or panic"
> > +# success, all done
> > +status=0
> > +exit
> > diff --git a/tests/xfs/554.out b/tests/xfs/554.out
> > new file mode 100644
> > index 00000000..26910daa
> > --- /dev/null
> > +++ b/tests/xfs/554.out
> > @@ -0,0 +1,4 @@
> > +QA output created by 554
> > +Inject buf ioerror tag
> > +Random I/Os testing ...
> > +No hang or panic
> > -- 
> > 2.31.1
> > 
>

Darrick J. Wong Oct. 27, 2022, 4:06 p.m. UTC | #3

On Thu, Oct 27, 2022 at 10:24:59AM +0800, Zorro Lang wrote:
> On Wed, Oct 26, 2022 at 11:30:29AM -0700, Darrick J. Wong wrote:
> > On Thu, Oct 27, 2022 at 12:57:47AM +0800, Zorro Lang wrote:
> > > There was a known xfs crash bug fixed by e001873853d8 ("xfs: ensure
> > > we capture IO errors correctly"), so trys to cover this bug and make
> > > sure xfs can capture IO errors correctly, won't panic and hang again.
> > > 
> > > Signed-off-by: Zorro Lang <zlang@redhat.com>
> > > ---
> > > 
> > > Hi,
> > > 
> > > When I tried to tidy up our internal test cases recently, I found a very
> > > old case which trys to cover e001873853d8 ("xfs: ensure we capture IO errors
> > > correctly") which fix by Dave. At that time, we didn't support xfs injection,
> > > so we tested it by a systemtap script [1] to inject an ioerror.
> > > 
> > > Now this bug has been fixed long long time ago (9+ years), and that stap script
> > > is already out of date, can't work with new kernel. But good news is we have xfs
> > > injection now, so I try to resume this test case in fstests.
> > > 
> > > I didn't verify if this case can reproduce that bug on old rhel (which doesn't
> > > support error injection). The original case tried to inject errno 11, I'm
> > > not sure if it's worth trying more other errors. I searched "buf_ioerror" in
> > > fstests, found nothing. So maybe this bug is old enough, but it's worth covering
> > > this kind of test. So feel free to tell me if you have any suggestions :)
> > > 
> > > Thanks,
> > > Zorro
> > > 
> > > [1]
> > > probe module("xfs").function("xfs_buf_bio_end_io")
> > > {
> > >         if ($error == 0) {
> > >                 if ($bio->bi_rw & (1 << 4)) {
> > >                         $error = -11;
> > >                         printf("%s: comm %s, pid %d, setting error 11\n",
> > >                                 probefunc(), execname(), pid());
> > >                         print_stack(backtrace());
> > >                 }
> > >         }
> > > }
> > > 
> > >  tests/xfs/554     | 53 +++++++++++++++++++++++++++++++++++++++++++++++
> > >  tests/xfs/554.out |  4 ++++
> > >  2 files changed, 57 insertions(+)
> > >  create mode 100755 tests/xfs/554
> > >  create mode 100644 tests/xfs/554.out
> > > 
> > > diff --git a/tests/xfs/554 b/tests/xfs/554
> > > new file mode 100755
> > > index 00000000..6935bfc0
> > > --- /dev/null
> > > +++ b/tests/xfs/554
> > > @@ -0,0 +1,53 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (c) 2022 YOUR NAME HERE.  All Rights Reserved.
> > 
> > Mr. YOUR HERE,
> > 
> > Please write your real name in the copyright statement.
> > 
> > > +#
> > > +# FS QA Test 554
> > > +#
> > > +# There was a known xfs crash bug fixed by e001873853d8 ("xfs: ensure we
> > > +# capture IO errors correctly"), so trys to cover this bug and make sure
> > > +# xfs can capture IO errors correctly, won't panic and hang again.
> > > +#
> > > +. ./common/preamble
> > > +_begin_fstest auto eio
> > > +
> > > +_cleanup()
> > > +{
> > > +	$KILLALL_PROG -q fsstress 2> /dev/null
> > > +	# ensures all fsstress processes died
> > > +	wait
> > > +	# log replay, due to the buf_ioerror injection might leave dirty log
> > > +	_scratch_cycle_mount
> > > +	cd /
> > > +	rm -r -f $tmp.*
> > > +}
> > > +
> > > +# Import common functions.
> > > +. ./common/inject
> > > +
> > > +# real QA test starts here
> > > +_supported_fs xfs
> > > +_require_command "$KILLALL_PROG" "killall"
> > > +_require_scratch
> > > +_require_xfs_debug
> > > +_require_xfs_io_error_injection "buf_ioerror"
> > > +
> > > +_scratch_mkfs >> $seqres.full
> > > +_scratch_mount
> > > +
> > > +echo "Inject buf ioerror tag"
> > > +_scratch_inject_error buf_ioerror 11
> > > +
> > > +echo "Random I/Os testing ..."
> > > +$FSSTRESS_PROG $FSSTRESS_AVOID -d $SCRATCH_MNT -n 50000 -p 100 >> $seqres.full &
> > > +for ((i=0; i<5; i++));do
> > > +	# Clear caches, then trys to use 'find' to trigger readahead
> > 
> > BUF_IOERROR only seems to apply to async writes:
> > 
> > static void
> > xfs_buf_bio_end_io(
> > 	struct bio		*bio)
> > {
> > 	struct xfs_buf		*bp = (struct xfs_buf *)bio->bi_private;
> > 
> > 	if (!bio->bi_status &&
> > 	    (bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC) &&
> > 	    XFS_TEST_ERROR(false, bp->b_mount, XFS_ERRTAG_BUF_IOERROR))
> > 		bio->bi_status = BLK_STS_IOERR;
> > 
> > So I don't see how this would reproduce the problem of b_error not being
> > cleared after a failed readahead and re-read?
> 
> Oh, "bp->b_flags & XBF_WRITE) && (bp->b_flags & XBF_ASYNC)" ... so I don't
> have chance to cover this bug? I have to abandon this patch, or we'd like to
> change it to be a general async ioerror injection test.

Well you /could/ add a new knob to make readahead fail, that's probably
an interesting case that doesn't get tested much.

--D

> Thanks,
> Zorro
> 
> > 
> > --D
> > 
> > > +	echo 3 > /proc/sys/vm/drop_caches
> > > +	find $SCRATCH_MNT >/dev/null 2>&1
> > > +	sleep 3
> > > +done
> > > +
> > > +echo "No hang or panic"
> > > +# success, all done
> > > +status=0
> > > +exit
> > > diff --git a/tests/xfs/554.out b/tests/xfs/554.out
> > > new file mode 100644
> > > index 00000000..26910daa
> > > --- /dev/null
> > > +++ b/tests/xfs/554.out
> > > @@ -0,0 +1,4 @@
> > > +QA output created by 554
> > > +Inject buf ioerror tag
> > > +Random I/Os testing ...
> > > +No hang or panic
> > > -- 
> > > 2.31.1
> > > 
> > 
>

diff --git a/tests/xfs/554 b/tests/xfs/554
new file mode 100755
index 00000000..6935bfc0
--- /dev/null
+++ b/tests/xfs/554
@@ -0,0 +1,53 @@ 
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2022 YOUR NAME HERE.  All Rights Reserved.
+#
+# FS QA Test 554
+#
+# There was a known xfs crash bug fixed by e001873853d8 ("xfs: ensure we
+# capture IO errors correctly"), so trys to cover this bug and make sure
+# xfs can capture IO errors correctly, won't panic and hang again.
+#
+. ./common/preamble
+_begin_fstest auto eio
+
+_cleanup()
+{
+	$KILLALL_PROG -q fsstress 2> /dev/null
+	# ensures all fsstress processes died
+	wait
+	# log replay, due to the buf_ioerror injection might leave dirty log
+	_scratch_cycle_mount
+	cd /
+	rm -r -f $tmp.*
+}
+
+# Import common functions.
+. ./common/inject
+
+# real QA test starts here
+_supported_fs xfs
+_require_command "$KILLALL_PROG" "killall"
+_require_scratch
+_require_xfs_debug
+_require_xfs_io_error_injection "buf_ioerror"
+
+_scratch_mkfs >> $seqres.full
+_scratch_mount
+
+echo "Inject buf ioerror tag"
+_scratch_inject_error buf_ioerror 11
+
+echo "Random I/Os testing ..."
+$FSSTRESS_PROG $FSSTRESS_AVOID -d $SCRATCH_MNT -n 50000 -p 100 >> $seqres.full &
+for ((i=0; i<5; i++));do
+	# Clear caches, then trys to use 'find' to trigger readahead
+	echo 3 > /proc/sys/vm/drop_caches
+	find $SCRATCH_MNT >/dev/null 2>&1
+	sleep 3
+done
+
+echo "No hang or panic"
+# success, all done
+status=0
+exit
diff --git a/tests/xfs/554.out b/tests/xfs/554.out
new file mode 100644
index 00000000..26910daa
--- /dev/null
+++ b/tests/xfs/554.out
@@ -0,0 +1,4 @@ 
+QA output created by 554
+Inject buf ioerror tag
+Random I/Os testing ...
+No hang or panic

xfs: new test to ensure xfs can capture IO errors correctly

Commit Message

Comments

Patch