diff mbox series

[2/2] check: _check_filesystems for errors even if test failed

Message ID 20230414221133.3431400-2-leah.rumancik@gmail.com (mailing list archive)
State New, archived
Headers show
Series [1/2] check: try to fix the test device if it gets corrupted | expand

Commit Message

Leah Rumancik April 14, 2023, 10:11 p.m. UTC
Previously, we would only run _check_filesystems to ensure that a test
that appeared to pass did not have any filesystem corruption. However,
in _check_filesystems, we also repair any errors found in the filesystem.
Let's do this even if we already know the test failed so that subsequent
tests aren't affected.

Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
---
 check | 1 +
 1 file changed, 1 insertion(+)

Comments

Darrick J. Wong April 15, 2023, 12:41 a.m. UTC | #1
On Fri, Apr 14, 2023 at 03:11:33PM -0700, Leah Rumancik wrote:
> Previously, we would only run _check_filesystems to ensure that a test
> that appeared to pass did not have any filesystem corruption. However,
> in _check_filesystems, we also repair any errors found in the filesystem.
> Let's do this even if we already know the test failed so that subsequent
> tests aren't affected.
> 
> Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
> ---
>  check | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/check b/check
> index befbf465..c18f02ca 100755
> --- a/check
> +++ b/check
> @@ -972,6 +972,7 @@ function run_section()
>  			# Even though we failed, there may be something interesting in
>  			# dmesg which can help debugging.
>  			_check_dmesg
> +			(_adjust_oom_score 250; _check_filesystems)

Seeing as the test failed, do we care about the state of the scratch fs?
Would it be sufficient only to clean up the test fs to avoid cascading
damage?

(Asking as someone who knows how impactful slow filesystem checking can
be on fstests runtimes... ;))

--D

>  			tc_status="fail"
>  		else
>  			# The test apparently passed, so check for corruption
> -- 
> 2.40.0.634.g4ca3ef3211-goog
>
Leah Rumancik April 17, 2023, 5:19 p.m. UTC | #2
Sure, will send out another set. Thanks!

On Fri, Apr 14, 2023 at 5:41 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Fri, Apr 14, 2023 at 03:11:33PM -0700, Leah Rumancik wrote:
> > Previously, we would only run _check_filesystems to ensure that a test
> > that appeared to pass did not have any filesystem corruption. However,
> > in _check_filesystems, we also repair any errors found in the filesystem.
> > Let's do this even if we already know the test failed so that subsequent
> > tests aren't affected.
> >
> > Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
> > ---
> >  check | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/check b/check
> > index befbf465..c18f02ca 100755
> > --- a/check
> > +++ b/check
> > @@ -972,6 +972,7 @@ function run_section()
> >                       # Even though we failed, there may be something interesting in
> >                       # dmesg which can help debugging.
> >                       _check_dmesg
> > +                     (_adjust_oom_score 250; _check_filesystems)
>
> Seeing as the test failed, do we care about the state of the scratch fs?
> Would it be sufficient only to clean up the test fs to avoid cascading
> damage?
>
> (Asking as someone who knows how impactful slow filesystem checking can
> be on fstests runtimes... ;))
>
> --D
>
> >                       tc_status="fail"
> >               else
> >                       # The test apparently passed, so check for corruption
> > --
> > 2.40.0.634.g4ca3ef3211-goog
> >
Leah Rumancik April 17, 2023, 8:08 p.m. UTC | #3
Well actually, if you are using DUMP_CORRUPT_FS, you'd probably want
_check_filesystems to run on the scratch fs as well. This shouldn't
add that much time if most of your tests are passing :) but I could
make this only run for scratch fs on failure only if DUMP_CORRUPT_FS
is set?


On Mon, Apr 17, 2023 at 10:19 AM Leah Rumancik <lrumancik@google.com> wrote:
>
> Sure, will send out another set. Thanks!
>
> On Fri, Apr 14, 2023 at 5:41 PM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Fri, Apr 14, 2023 at 03:11:33PM -0700, Leah Rumancik wrote:
> > > Previously, we would only run _check_filesystems to ensure that a test
> > > that appeared to pass did not have any filesystem corruption. However,
> > > in _check_filesystems, we also repair any errors found in the filesystem.
> > > Let's do this even if we already know the test failed so that subsequent
> > > tests aren't affected.
> > >
> > > Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
> > > ---
> > >  check | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/check b/check
> > > index befbf465..c18f02ca 100755
> > > --- a/check
> > > +++ b/check
> > > @@ -972,6 +972,7 @@ function run_section()
> > >                       # Even though we failed, there may be something interesting in
> > >                       # dmesg which can help debugging.
> > >                       _check_dmesg
> > > +                     (_adjust_oom_score 250; _check_filesystems)
> >
> > Seeing as the test failed, do we care about the state of the scratch fs?
> > Would it be sufficient only to clean up the test fs to avoid cascading
> > damage?
> >
> > (Asking as someone who knows how impactful slow filesystem checking can
> > be on fstests runtimes... ;))
> >
> > --D
> >
> > >                       tc_status="fail"
> > >               else
> > >                       # The test apparently passed, so check for corruption
> > > --
> > > 2.40.0.634.g4ca3ef3211-goog
> > >
Darrick J. Wong April 18, 2023, 4:49 a.m. UTC | #4
On Mon, Apr 17, 2023 at 01:08:20PM -0700, Leah Rumancik wrote:
> Well actually, if you are using DUMP_CORRUPT_FS, you'd probably want
> _check_filesystems to run on the scratch fs as well.

ooh, good point

> This shouldn't
> add that much time if most of your tests are passing :) but I could
> make this only run for scratch fs on failure only if DUMP_CORRUPT_FS
> is set?

Hmm.  On second thought, you've convinced me to go along with this.
The test failed, let's see if corruption happened --> HAPPY DANCE.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> 
> On Mon, Apr 17, 2023 at 10:19 AM Leah Rumancik <lrumancik@google.com> wrote:
> >
> > Sure, will send out another set. Thanks!
> >
> > On Fri, Apr 14, 2023 at 5:41 PM Darrick J. Wong <djwong@kernel.org> wrote:
> > >
> > > On Fri, Apr 14, 2023 at 03:11:33PM -0700, Leah Rumancik wrote:
> > > > Previously, we would only run _check_filesystems to ensure that a test
> > > > that appeared to pass did not have any filesystem corruption. However,
> > > > in _check_filesystems, we also repair any errors found in the filesystem.
> > > > Let's do this even if we already know the test failed so that subsequent
> > > > tests aren't affected.
> > > >
> > > > Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com>
> > > > ---
> > > >  check | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > >
> > > > diff --git a/check b/check
> > > > index befbf465..c18f02ca 100755
> > > > --- a/check
> > > > +++ b/check
> > > > @@ -972,6 +972,7 @@ function run_section()
> > > >                       # Even though we failed, there may be something interesting in
> > > >                       # dmesg which can help debugging.
> > > >                       _check_dmesg
> > > > +                     (_adjust_oom_score 250; _check_filesystems)
> > >
> > > Seeing as the test failed, do we care about the state of the scratch fs?
> > > Would it be sufficient only to clean up the test fs to avoid cascading
> > > damage?
> > >
> > > (Asking as someone who knows how impactful slow filesystem checking can
> > > be on fstests runtimes... ;))
> > >
> > > --D
> > >
> > > >                       tc_status="fail"
> > > >               else
> > > >                       # The test apparently passed, so check for corruption
> > > > --
> > > > 2.40.0.634.g4ca3ef3211-goog
> > > >
diff mbox series

Patch

diff --git a/check b/check
index befbf465..c18f02ca 100755
--- a/check
+++ b/check
@@ -972,6 +972,7 @@  function run_section()
 			# Even though we failed, there may be something interesting in
 			# dmesg which can help debugging.
 			_check_dmesg
+			(_adjust_oom_score 250; _check_filesystems)
 			tc_status="fail"
 		else
 			# The test apparently passed, so check for corruption