diff mbox

common: introduce XFS_IO_AVOID env var

Message ID 1445107518-32022-1-git-send-email-tytso@mit.edu (mailing list archive)
State New, archived
Headers show

Commit Message

Theodore Ts'o Oct. 17, 2015, 6:45 p.m. UTC
Like FSSTRESS_AVOID and FSX_AVOID, XFS_IO_AVOID can be used to avoid
using various advanced file system features such as "fpunch"
"fcollapse", "finsert", or "zero".  Tests that require an xfs_io
command which is included in the space-separated list found in the
XFS_IO_AVOID environment variable will be skipped using _notrun.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
---
 README    | 4 ++++
 common/rc | 5 +++++
 2 files changed, 9 insertions(+)

Comments

Dave Chinner Oct. 18, 2015, 9:37 p.m. UTC | #1
On Sat, Oct 17, 2015 at 02:45:18PM -0400, Theodore Ts'o wrote:
> Like FSSTRESS_AVOID and FSX_AVOID, XFS_IO_AVOID can be used to avoid
> using various advanced file system features such as "fpunch"
> "fcollapse", "finsert", or "zero".  Tests that require an xfs_io
> command which is included in the space-separated list found in the
> XFS_IO_AVOID environment variable will be skipped using _notrun.

This tells me what the change does, not why you need it.

> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> ---
>  README    | 4 ++++
>  common/rc | 5 +++++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/README b/README
> index f8a878c..bb0075b 100644
> --- a/README
> +++ b/README
> @@ -78,6 +78,10 @@ Preparing system for tests (IRIX and Linux):
>                 added to the end of fsstresss and fsx invocations, respectively,
>                 in case you wish to exclude certain operational modes from these
>                 tests.
> +	     - setenv XFS_IO_AVOID which may contain a list of space separated
> +	       xfs_io commands which will be avoided in case you want to
> +	       exclude tests that require the use of certain file system
> +	       operations such as "fpunch", "fcollapse", "finsert", or "zero".

Again, what, not why.

>          - or add a case to the switch in common/config assigning
>            these variables based on the hostname of your test
> diff --git a/common/rc b/common/rc
> index adf1edf..61de97e 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -1603,6 +1603,11 @@ _require_xfs_io_command()
>  	fi
>  	command=$1
>  
> +	if echo "$XFS_IO_AVOID" | grep -wq "$command"
> +	then
> +		_notrun "Avoiding xfs_io $command"
> +	fi
> +
>  	testfile=$TEST_DIR/$$.xfs_io
>  	case $command in
>  	"falloc" )

And the following case statement actually tests where the xfs_io
command is supported by the binary and kernel, and notruns if it
isn't supported by either. Why isn't this sufficient?

Can you describe the problem so we can understand why explicitly
avoiding certain xfs_io operations rather than using probing is
necessary?

Cheers,

Dave.
Theodore Ts'o Oct. 18, 2015, 10:08 p.m. UTC | #2
On Mon, Oct 19, 2015 at 08:37:45AM +1100, Dave Chinner wrote:
> On Sat, Oct 17, 2015 at 02:45:18PM -0400, Theodore Ts'o wrote:
> > Like FSSTRESS_AVOID and FSX_AVOID, XFS_IO_AVOID can be used to avoid
> > using various advanced file system features such as "fpunch"
> > "fcollapse", "finsert", or "zero".  Tests that require an xfs_io
> > command which is included in the space-separated list found in the
> > XFS_IO_AVOID environment variable will be skipped using _notrun.
> 
> This tells me what the change does, not why you need it.

We didn't go into as much detail for FSX_AVOID and FSSTRESS_AVOID, but
it's basically for the same reason.  Sometimes we might want to
exclude certain operational modes from the xfststs.

As one example, currently xfstests doesn't understand the difference
between the allocation cluster size and the block size.  In the case
of collapse/insert range, it means operations need to be done in
cluster-aligned chunks, as opposed to block-aligned clusters.  So to
suppress test noise it's nice if we can suppress all attempts to use
insert_range/collapse_range while running a test --- and using
FSX_AVOID and FSSTRESS_AVOID isn't sufficient for this purpose.

(As you may recall, I while back had a test patch allowed the file
system to claim that it doesn't support collapse or insert range, and
you and Eric argued that the right way to handle this was using
environment variables like FSX_AVOID and FSSTRESS_AVOID to cause
xfstests to avoid to use certain fallocate operational modes.
XFS_IO_AVOID is basically an extension of that same functionality.
Without this, hacking the kernel to always fail certain fallocate mode
is still needed in order to force certain tests to be skipped.)

As another example, recently I was trying to debug a memory leak, both
on the most recent version of the kernel, as well as old stable
kernels, and it's very useful to be able to see if a particular bug
might be caused by say, buggy collapse range support.

So I wanted to be able to suppress all use of collapse range by being
able to do something as simple as "gce-xfstests --no-collapse".  This
gets expanded to setting the FSX_AVOID, FSSTRESS_AVOID, and
XFS_IO_AVOID environment variables.

This way it becomes easier to see if certain test failures go away if
an fallocate operational mode is suppressed --- and hence, allows us
to localize the failure to collapse_range or insert_range.

Cheers,

							- Ted
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Oct. 18, 2015, 11:14 p.m. UTC | #3
On Sun, Oct 18, 2015 at 06:08:05PM -0400, Theodore Ts'o wrote:
> On Mon, Oct 19, 2015 at 08:37:45AM +1100, Dave Chinner wrote:
> > On Sat, Oct 17, 2015 at 02:45:18PM -0400, Theodore Ts'o wrote:
> > > Like FSSTRESS_AVOID and FSX_AVOID, XFS_IO_AVOID can be used to avoid
> > > using various advanced file system features such as "fpunch"
> > > "fcollapse", "finsert", or "zero".  Tests that require an xfs_io
> > > command which is included in the space-separated list found in the
> > > XFS_IO_AVOID environment variable will be skipped using _notrun.
> > 
> > This tells me what the change does, not why you need it.
> 
> We didn't go into as much detail for FSX_AVOID and FSSTRESS_AVOID, but
> it's basically for the same reason.  Sometimes we might want to
> exclude certain operational modes from the xfststs.

That's because FSSTRESS_AVOID had existed for a *long time* and
FSX_AVOID was basically the same thing. 

However, chopping out entire tests because they use a specific
xfs_io command is a little different - it can exclude a lot of
explicit correctness tests, rather than just change the pattern of
stress loads by excluding certain operations.

> As one example, currently xfstests doesn't understand the difference
> between the allocation cluster size and the block size.  In the case
> of collapse/insert range, it means operations need to be done in
> cluster-aligned chunks, as opposed to block-aligned clusters.  So to
> suppress test noise it's nice if we can suppress all attempts to use
> insert_range/collapse_range while running a test --- and using
> FSX_AVOID and FSSTRESS_AVOID isn't sufficient for this purpose.

Clearly they weren't designed for that purpose, either. :)

> So I wanted to be able to suppress all use of collapse range by being
> able to do something as simple as "gce-xfstests --no-collapse".  This
> gets expanded to setting the FSX_AVOID, FSSTRESS_AVOID, and
> XFS_IO_AVOID environment variables.

This is exactly what the "-x group" CLI option is for: To exclude
specific groups of tests from running, such as:

# ./check -g auto -x fcollapse

i.e. selection/exclusion of tests is done by group classifications
in tests/*/group, not environment variables. Update the group files,
and everything will work just fine for you.

FWIW, any test that uses a fallocatei() based command should also be
in the prealloc group. Can you update the group files to ensure
these tests are tagged with prealloc at the same time?

Cheers,

Dave.
Theodore Ts'o Oct. 19, 2015, 12:10 a.m. UTC | #4
On Mon, Oct 19, 2015 at 10:14:02AM +1100, Dave Chinner wrote:
> 
> However, chopping out entire tests because they use a specific
> xfs_io command is a little different - it can exclude a lot of
> explicit correctness tests, rather than just change the pattern of
> stress loads by excluding certain operations.

Well, if a test which is testing collapse_range, and we chop it out,
then yes, we are excluding an explicit correctness test (for collapse
range) --- but that's the whole point.  I guess what you're saying is
that by excluding tests that require "collapse range", we might be
excluding something that is testing core file system correctness
beyond just, say, "collapse range" or "insert range".  But are there
really tests that fall into that category?

> This is exactly what the "-x group" CLI option is for: To exclude
> specific groups of tests from running, such as:
> 
> # ./check -g auto -x fcollapse
> 
> i.e. selection/exclusion of tests is done by group classifications
> in tests/*/group, not environment variables. Update the group files,
> and everything will work just fine for you.

The tests already have that information encoded in the
'_require_xfs_io_command "fcollapse"' assertion in the test file.  We
could update the group files, but that's a manual way of solving a
database synchronization problem.  This is why I liked the
XFS_AIO_AVOID solution, since the information of which tests use
fcollapse or finsert is already realiably available in one place ---
in the test script.

It also means that when people create new tests, they need to know
what groups should be included in the test's group file entry, or code
reviewers have to manually check to make sure the group file is
properly updated when each new test is added.  Which leads me to...

> FWIW, any test that uses a fallocatei() based command should also be
> in the prealloc group. Can you update the group files to ensure
> these tests are tagged with prealloc at the same time?

Is there documentation about what all of the groups are supposed to
mean, and what groups a new test should be part of?

For example, what is the precise meaning of the "dangerous" group?
I've recently tried running all of the dangerous group tests against
3.10.89, 3.14.53, 3.18.21, 4.1.8, and 4.3-rc2, and none of the caused
the kernels to crash when using ext4.  I was about to submit a patch
to remove the ext4/30[1234] tests from the "dangerous" group and add
them to the "auto" group.

I wasn't sure what to do with generic/019, generic/068, and
generic/280, which are in the dangerous group, and are passing for
ext4, but may be causing other file systems to crash.  But part of
that is I wasn't sure what the formal definition of "dangerous" is
supposed to mean.

Similarly, it didn't even *occur* to me that say, a test which uses
fpunch should be part of the prealloc group.  Yes, it's an fallocate()
based command, but it otherwise has no connection to preallocation.
I'm sure there's probably some very good historical, and possibly
still relevant reason today for that definition of "prealloc"; but it
certainly isn't obvious to me.

Cheers,

						- Ted

P.S.  More questions --- what is the meaning of the "ioctl" group
supposed to be, and what is its intended use?  And is there a reason
why we don't have a "defrag" group?  More generally, what's the
criteria used to evaluate the appropriateness of a new proposed group?
--
To unsubscribe from this list: send the line "unsubscribe fstests" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Chinner Oct. 19, 2015, 2:29 a.m. UTC | #5
On Sun, Oct 18, 2015 at 08:10:58PM -0400, Theodore Ts'o wrote:
> On Mon, Oct 19, 2015 at 10:14:02AM +1100, Dave Chinner wrote:
> > 
> > However, chopping out entire tests because they use a specific
> > xfs_io command is a little different - it can exclude a lot of
> > explicit correctness tests, rather than just change the pattern of
> > stress loads by excluding certain operations.
> 
> Well, if a test which is testing collapse_range, and we chop it out,
> then yes, we are excluding an explicit correctness test (for collapse
> range) --- but that's the whole point.  I guess what you're saying is
> that by excluding tests that require "collapse range", we might be
> excluding something that is testing core file system correctness
> beyond just, say, "collapse range" or "insert range".  But are there
> really tests that fall into that category?

No idea, and I don't really care, either.

> > This is exactly what the "-x group" CLI option is for: To exclude
> > specific groups of tests from running, such as:
> > 
> > # ./check -g auto -x fcollapse
> > 
> > i.e. selection/exclusion of tests is done by group classifications
> > in tests/*/group, not environment variables. Update the group files,
> > and everything will work just fine for you.
> 
> The tests already have that information encoded in the
> '_require_xfs_io_command "fcollapse"' assertion in the test file.

Yes, but that's so the "auto" group does the right thing when it
comes across the test and it's being run on a platform/fs that
doesn't support that functionality.

You want to manually control what tests you run, and we already
have generic mechanisms for that. Indeed, the external expunge file
was added precisely so you could maintain these expunge lists easily
yourself in your kvm/gce test harness:

commit 9f3515572c4939674bdb582e26ca12baa1575416
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Tue May 13 09:04:13 2014 +1000

    check: add support for an external file containing tests to exclude
    
    Currently the -X option is intended to specify a set of expunging
    files which are stored in each test/* subdirectory.  As described in
    the commit description for 0b1e8abd4, in order to exclude the test
    generic/280, the -X option is used as follows:
    
        $ cat tests/generic/3.0-stable-avoid
        280
        $ sudo ./check -X 3.0-stable-avoid generic/280
    
    However, it is sometimes useful to store the set of expunged tests in
    a single file, outside of tests/* subdirectories.  This commit enables
    the following:
    
        $ cat /root/conf/data_journal.exclude
        generic/068
        ext4/301
        $ sudo ./check -E /root/conf/data_journal.exclude -g auto
    
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Dave Chinner <david@fromorbit.com>

We don't need mulitple mechanisms to implement the same
funtionality in slightly different contexts.

> It also means that when people create new tests, they need to know
> what groups should be included in the test's group file entry, or code
> reviewers have to manually check to make sure the group file is
> properly updated when each new test is added.  Which leads me to...

That would be your responsibility to keep up to date. i.e. you'd
need to catch group misuse as a reviewer, or send patches to fix it
up after the fact. If you need something, then you keep it working
and make sure people learn how to use it properly.

> > FWIW, any test that uses a fallocatei() based command should also be
> > in the prealloc group. Can you update the group files to ensure
> > these tests are tagged with prealloc at the same time?
> 
> Is there documentation about what all of the groups are supposed to
> mean, and what groups a new test should be part of?

It's all pretty obvious, yes?

auto = expected to work as a regression test
quick = runs fast
rw = does read/write IO operations
aio = does AIO
metadata = tests/stresses metadata operations
prealloc = does extent manipulations
enospc = exercises operation at enospc
acl = ACL tests
attr = extended attribute tests
freeze = filesystem freeze tests
....

and so on.

> For example, what is the precise meaning of the "dangerous" group?

"Test is likely to oops/hang the kernel and prevent subsequent tests
from being run and/or completing correctly."

> I've recently tried running all of the dangerous group tests against
> 3.10.89, 3.14.53, 3.18.21, 4.1.8, and 4.3-rc2, and none of the caused
> the kernels to crash when using ext4.  I was about to submit a patch
> to remove the ext4/30[1234] tests from the "dangerous" group and add
> them to the "auto" group.

So whatever bugs they were triggering have been fixed? If so, send a
patch to add them to the auto group.

> I wasn't sure what to do with generic/019, generic/068, and
> generic/280, which are in the dangerous group, and are passing for
> ext4, but may be causing other file systems to crash.  But part of
> that is I wasn't sure what the formal definition of "dangerous" is
> supposed to mean.

generic/068 and /280 are already part of the auto group, indicating
that they work fine on current kernels, but are likely to hang/crash
older kernels.

As for generic/019, it's dangerous for other filesystems, especially
XFS w/ CONFIG_XFS_DEBUG=y as it detects a in-memory corruption
caused by shutdown detection in the middle of a trnasaction that is
then aborted.  That causes an ASSERT failure and hence an oops and
the machine is brought down. So, no auto group for that test until
that problem is fixed.

> Similarly, it didn't even *occur* to me that say, a test which uses
> fpunch should be part of the prealloc group.  Yes, it's an fallocate()
> based command, but it otherwise has no connection to preallocation.

So now you know - it's been the placeholder for direct extent
manipulation tests for many, many years, whether they be allocating
extents, punching them out or even querying them
(xfs_bmap/fiemap)...

> I'm sure there's probably some very good historical, and possibly
> still relevant reason today for that definition of "prealloc"; but it
> certainly isn't obvious to me.

So write a patch to document the group names and intended categories
of tests they should contain, and another to change the name of the
prealloc group to something like "extent_ops" so it's easier to
understand.

Cheers,

Dave.
diff mbox

Patch

diff --git a/README b/README
index f8a878c..bb0075b 100644
--- a/README
+++ b/README
@@ -78,6 +78,10 @@  Preparing system for tests (IRIX and Linux):
                added to the end of fsstresss and fsx invocations, respectively,
                in case you wish to exclude certain operational modes from these
                tests.
+	     - setenv XFS_IO_AVOID which may contain a list of space separated
+	       xfs_io commands which will be avoided in case you want to
+	       exclude tests that require the use of certain file system
+	       operations such as "fpunch", "fcollapse", "finsert", or "zero".
 
         - or add a case to the switch in common/config assigning
           these variables based on the hostname of your test
diff --git a/common/rc b/common/rc
index adf1edf..61de97e 100644
--- a/common/rc
+++ b/common/rc
@@ -1603,6 +1603,11 @@  _require_xfs_io_command()
 	fi
 	command=$1
 
+	if echo "$XFS_IO_AVOID" | grep -wq "$command"
+	then
+		_notrun "Avoiding xfs_io $command"
+	fi
+
 	testfile=$TEST_DIR/$$.xfs_io
 	case $command in
 	"falloc" )