Message ID | 167243825203.682859.1144819928544539264.stgit@magnolia (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | xfs: design documentation for online fsck | expand |
On Fri, 2022-12-30 at 14:10 -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > Start the third chapter of the online fsck design documentation. > This > covers the testing plan to make sure that both online and offline > fsck > can detect arbitrary problems and correct them without making things > worse. > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > --- > .../filesystems/xfs-online-fsck-design.rst | 187 > ++++++++++++++++++++ > 1 file changed, 187 insertions(+) > > > diff --git a/Documentation/filesystems/xfs-online-fsck-design.rst > b/Documentation/filesystems/xfs-online-fsck-design.rst > index a03a7b9f0250..d630b6bdbe4a 100644 > --- a/Documentation/filesystems/xfs-online-fsck-design.rst > +++ b/Documentation/filesystems/xfs-online-fsck-design.rst > @@ -563,3 +563,190 @@ functionality. > Many of these risks are inherent to software programming. > Despite this, it is hoped that this new functionality will prove > useful in > reducing unexpected downtime. > + > +3. Testing Plan > +=============== > + > +As stated before, fsck tools have three main goals: > + > +1. Detect inconsistencies in the metadata; > + > +2. Eliminate those inconsistencies; and > + > +3. Minimize further loss of data. > + > +Demonstrations of correct operation are necessary to build users' > confidence > +that the software behaves within expectations. > +Unfortunately, it was not really feasible to perform regular > exhaustive testing > +of every aspect of a fsck tool until the introduction of low-cost > virtual > +machines with high-IOPS storage. > +With ample hardware availability in mind, the testing strategy for > the online > +fsck project involves differential analysis against the existing > fsck tools and > +systematic testing of every attribute of every type of metadata > object. > +Testing can be split into four major categories, as discussed below. > + > +Integrated Testing with fstests > +------------------------------- > + > +The primary goal of any free software QA effort is to make testing > as > +inexpensive and widespread as possible to maximize the scaling > advantages of > +community. > +In other words, testing should maximize the breadth of filesystem > configuration > +scenarios and hardware setups. > +This improves code quality by enabling the authors of online fsck to > find and > +fix bugs early, and helps developers of new features to find > integration > +issues earlier in their development effort. > + > +The Linux filesystem community shares a common QA testing suite, > +`fstests > <https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/>`_, for > +functional and regression testing. > +Even before development work began on online fsck, fstests (when run > on XFS) > +would run both the ``xfs_check`` and ``xfs_repair -n`` commands on > the test and > +scratch filesystems between each test. > +This provides a level of assurance that the kernel and the fsck > tools stay in > +alignment about what constitutes consistent metadata. > +During development of the online checking code, fstests was modified > to run > +``xfs_scrub -n`` between each test to ensure that the new checking > code > +produces the same results as the two existing fsck tools. > + > +To start development of online repair, fstests was modified to run > +``xfs_repair`` to rebuild the filesystem's metadata indices between > tests. > +This ensures that offline repair does not crash, leave a corrupt > filesystem > +after it exists, or trigger complaints from the online check. > +This also established a baseline for what can and cannot be repaired > offline. > +To complete the first phase of development of online repair, fstests > was > +modified to be able to run ``xfs_scrub`` in a "force rebuild" mode. > +This enables a comparison of the effectiveness of online repair as > compared to > +the existing offline repair tools. > + > +General Fuzz Testing of Metadata Blocks > +--------------------------------------- > + > +XFS benefits greatly from having a very robust debugging tool, > ``xfs_db``. > + > +Before development of online fsck even began, a set of fstests were > created > +to test the rather common fault that entire metadata blocks get > corrupted. > +This required the creation of fstests library code that can create a > filesystem > +containing every possible type of metadata object. > +Next, individual test cases were created to create a test > filesystem, identify > +a single block of a specific type of metadata object, trash it with > the > +existing ``blocktrash`` command in ``xfs_db``, and test the reaction > of a > +particular metadata validation strategy. > + > +This earlier test suite enabled XFS developers to test the ability > of the > +in-kernel validation functions and the ability of the offline fsck > tool to > +detect and eliminate the inconsistent metadata. > +This part of the test suite was extended to cover online fsck in > exactly the > +same manner. > + > +In other words, for a given fstests filesystem configuration: > + > +* For each metadata object existing on the filesystem: > + > + * Write garbage to it > + > + * Test the reactions of: > + > + 1. The kernel verifiers to stop obviously bad metadata > + 2. Offline repair (``xfs_repair``) to detect and fix > + 3. Online repair (``xfs_scrub``) to detect and fix > + > +Targeted Fuzz Testing of Metadata Records > +----------------------------------------- > + > +A quick conversation with the other XFS developers revealed that the > existing > +test infrastructure could be extended to provide "The testing plan for ofsck includes extending the existing test infrastructure to provide..." Took me a moment to notice we're not talking about history any more.... > a much more powerful > +facility: targeted fuzz testing of every metadata field of every > metadata > +object in the filesystem. > +``xfs_db`` can modify every field of every metadata structure in > every > +block in the filesystem to simulate the effects of memory corruption > and > +software bugs. > +Given that fstests already contains the ability to create a > filesystem > +containing every metadata format known to the filesystem, ``xfs_db`` > can be > +used to perform exhaustive fuzz testing! > + > +For a given fstests filesystem configuration: > + > +* For each metadata object existing on the filesystem... > + > + * For each record inside that metadata object... > + > + * For each field inside that record... > + > + * For each conceivable type of transformation that can be > applied to a bit field... > + > + 1. Clear all bits > + 2. Set all bits > + 3. Toggle the most significant bit > + 4. Toggle the middle bit > + 5. Toggle the least significant bit > + 6. Add a small quantity > + 7. Subtract a small quantity > + 8. Randomize the contents > + > + * ...test the reactions of: > + > + 1. The kernel verifiers to stop obviously bad metadata > + 2. Offline checking (``xfs_repair -n``) > + 3. Offline repair (``xfs_repair``) > + 4. Online checking (``xfs_scrub -n``) > + 5. Online repair (``xfs_scrub``) > + 6. Both repair tools (``xfs_scrub`` and then > ``xfs_repair`` if online repair doesn't succeed) I like the indented bullet list format tho > + > +This is quite the combinatoric explosion! > + > +Fortunately, having this much test coverage makes it easy for XFS > developers to > +check the responses of XFS' fsck tools. > +Since the introduction of the fuzz testing framework, these tests > have been > +used to discover incorrect repair code and missing functionality for > entire > +classes of metadata objects in ``xfs_repair``. > +The enhanced testing was used to finalize the deprecation of > ``xfs_check`` by > +confirming that ``xfs_repair`` could detect at least as many > corruptions as > +the older tool. > + > +These tests have been very valuable for ``xfs_scrub`` in the same > ways -- they > +allow the online fsck developers to compare online fsck against > offline fsck, > +and they enable XFS developers to find deficiencies in the code > base. > + > +Proposed patchsets include > +`general fuzzer improvements > +< > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.g > it/log/?h=fuzzer-improvements>`_, > +`fuzzing baselines > +< > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.g > it/log/?h=fuzz-baseline>`_, > +and `improvements in fuzz testing comprehensiveness > +< > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.g > it/log/?h=more-fuzz-testing>`_. > + > +Stress Testing > +-------------- > + > +A unique requirement to online fsck is the ability to operate on a > filesystem > +concurrently with regular workloads. > +Although it is of course impossible to run ``xfs_scrub`` with *zero* > observable > +impact on the running system, the online repair code should never > introduce > +inconsistencies into the filesystem metadata, and regular workloads > should > +never notice resource starvation. > +To verify that these conditions are being met, fstests has been > enhanced in > +the following ways: > + > +* For each scrub item type, create a test to exercise checking that > item type > + while running ``fsstress``. > +* For each scrub item type, create a test to exercise repairing that > item type > + while running ``fsstress``. > +* Race ``fsstress`` and ``xfs_scrub -n`` to ensure that checking the > whole > + filesystem doesn't cause problems. > +* Race ``fsstress`` and ``xfs_scrub`` in force-rebuild mode to > ensure that > + force-repairing the whole filesystem doesn't cause problems. > +* Race ``xfs_scrub`` in check and force-repair mode against > ``fsstress`` while > + freezing and thawing the filesystem. > +* Race ``xfs_scrub`` in check and force-repair mode against > ``fsstress`` while > + remounting the filesystem read-only and read-write. > +* The same, but running ``fsx`` instead of ``fsstress``. (Not done > yet?) > + > +Success is defined by the ability to run all of these tests without > observing > +any unexpected filesystem shutdowns due to corrupted metadata, > kernel hang > +check warnings, or any other sort of mischief. Seems reasonable. Other than the one nit, I think this section reads pretty well. Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Allison > + > +Proposed patchsets include `general stress testing > +< > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.g > it/log/?h=race-scrub-and-mount-state-changes>`_ > +and the `evolution of existing per-function stress testing > +< > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.g > it/log/?h=refactor-scrub-stress>`_. >
On Wed, Jan 18, 2023 at 12:03:17AM +0000, Allison Henderson wrote: > On Fri, 2022-12-30 at 14:10 -0800, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@kernel.org> > > > > Start the third chapter of the online fsck design documentation. > > This > > covers the testing plan to make sure that both online and offline > > fsck > > can detect arbitrary problems and correct them without making things > > worse. > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > --- > > .../filesystems/xfs-online-fsck-design.rst | 187 > > ++++++++++++++++++++ > > 1 file changed, 187 insertions(+) > > > > > > diff --git a/Documentation/filesystems/xfs-online-fsck-design.rst > > b/Documentation/filesystems/xfs-online-fsck-design.rst > > index a03a7b9f0250..d630b6bdbe4a 100644 > > --- a/Documentation/filesystems/xfs-online-fsck-design.rst > > +++ b/Documentation/filesystems/xfs-online-fsck-design.rst > > @@ -563,3 +563,190 @@ functionality. > > Many of these risks are inherent to software programming. > > Despite this, it is hoped that this new functionality will prove > > useful in > > reducing unexpected downtime. > > + > > +3. Testing Plan > > +=============== > > + > > +As stated before, fsck tools have three main goals: > > + > > +1. Detect inconsistencies in the metadata; > > + > > +2. Eliminate those inconsistencies; and > > + > > +3. Minimize further loss of data. > > + > > +Demonstrations of correct operation are necessary to build users' > > confidence > > +that the software behaves within expectations. > > +Unfortunately, it was not really feasible to perform regular > > exhaustive testing > > +of every aspect of a fsck tool until the introduction of low-cost > > virtual > > +machines with high-IOPS storage. > > +With ample hardware availability in mind, the testing strategy for > > the online > > +fsck project involves differential analysis against the existing > > fsck tools and > > +systematic testing of every attribute of every type of metadata > > object. > > +Testing can be split into four major categories, as discussed below. > > + > > +Integrated Testing with fstests > > +------------------------------- > > + > > +The primary goal of any free software QA effort is to make testing > > as > > +inexpensive and widespread as possible to maximize the scaling > > advantages of > > +community. > > +In other words, testing should maximize the breadth of filesystem > > configuration > > +scenarios and hardware setups. > > +This improves code quality by enabling the authors of online fsck to > > find and > > +fix bugs early, and helps developers of new features to find > > integration > > +issues earlier in their development effort. > > + > > +The Linux filesystem community shares a common QA testing suite, > > +`fstests > > <https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/>`_, for > > +functional and regression testing. > > +Even before development work began on online fsck, fstests (when run > > on XFS) > > +would run both the ``xfs_check`` and ``xfs_repair -n`` commands on > > the test and > > +scratch filesystems between each test. > > +This provides a level of assurance that the kernel and the fsck > > tools stay in > > +alignment about what constitutes consistent metadata. > > +During development of the online checking code, fstests was modified > > to run > > +``xfs_scrub -n`` between each test to ensure that the new checking > > code > > +produces the same results as the two existing fsck tools. > > + > > +To start development of online repair, fstests was modified to run > > +``xfs_repair`` to rebuild the filesystem's metadata indices between > > tests. > > +This ensures that offline repair does not crash, leave a corrupt > > filesystem > > +after it exists, or trigger complaints from the online check. > > +This also established a baseline for what can and cannot be repaired > > offline. > > +To complete the first phase of development of online repair, fstests > > was > > +modified to be able to run ``xfs_scrub`` in a "force rebuild" mode. > > +This enables a comparison of the effectiveness of online repair as > > compared to > > +the existing offline repair tools. > > + > > +General Fuzz Testing of Metadata Blocks > > +--------------------------------------- > > + > > +XFS benefits greatly from having a very robust debugging tool, > > ``xfs_db``. > > + > > +Before development of online fsck even began, a set of fstests were > > created > > +to test the rather common fault that entire metadata blocks get > > corrupted. > > +This required the creation of fstests library code that can create a > > filesystem > > +containing every possible type of metadata object. > > +Next, individual test cases were created to create a test > > filesystem, identify > > +a single block of a specific type of metadata object, trash it with > > the > > +existing ``blocktrash`` command in ``xfs_db``, and test the reaction > > of a > > +particular metadata validation strategy. > > + > > +This earlier test suite enabled XFS developers to test the ability > > of the > > +in-kernel validation functions and the ability of the offline fsck > > tool to > > +detect and eliminate the inconsistent metadata. > > +This part of the test suite was extended to cover online fsck in > > exactly the > > +same manner. > > + > > +In other words, for a given fstests filesystem configuration: > > + > > +* For each metadata object existing on the filesystem: > > + > > + * Write garbage to it > > + > > + * Test the reactions of: > > + > > + 1. The kernel verifiers to stop obviously bad metadata > > + 2. Offline repair (``xfs_repair``) to detect and fix > > + 3. Online repair (``xfs_scrub``) to detect and fix > > + > > +Targeted Fuzz Testing of Metadata Records > > +----------------------------------------- > > + > > +A quick conversation with the other XFS developers revealed that the > > existing > > +test infrastructure could be extended to provide > > "The testing plan for ofsck includes extending the existing test > infrastructure to provide..." > > Took me a moment to notice we're not talking about history any more.... Ah. Sorry about that. The sentence now reads: "The testing plan for online fsck includes extending the existing fs testing infrastructure to provide a much more powerful facility: targeted fuzz testing of every metadata field of every metadata object in the filesystem." > > a much more powerful > > +facility: targeted fuzz testing of every metadata field of every > > metadata > > +object in the filesystem. > > +``xfs_db`` can modify every field of every metadata structure in > > every > > +block in the filesystem to simulate the effects of memory corruption > > and > > +software bugs. > > +Given that fstests already contains the ability to create a > > filesystem > > +containing every metadata format known to the filesystem, ``xfs_db`` > > can be > > +used to perform exhaustive fuzz testing! > > + > > +For a given fstests filesystem configuration: > > + > > +* For each metadata object existing on the filesystem... > > + > > + * For each record inside that metadata object... > > + > > + * For each field inside that record... > > + > > + * For each conceivable type of transformation that can be > > applied to a bit field... > > + > > + 1. Clear all bits > > + 2. Set all bits > > + 3. Toggle the most significant bit > > + 4. Toggle the middle bit > > + 5. Toggle the least significant bit > > + 6. Add a small quantity > > + 7. Subtract a small quantity > > + 8. Randomize the contents > > + > > + * ...test the reactions of: > > + > > + 1. The kernel verifiers to stop obviously bad metadata > > + 2. Offline checking (``xfs_repair -n``) > > + 3. Offline repair (``xfs_repair``) > > + 4. Online checking (``xfs_scrub -n``) > > + 5. Online repair (``xfs_scrub``) > > + 6. Both repair tools (``xfs_scrub`` and then > > ``xfs_repair`` if online repair doesn't succeed) > I like the indented bullet list format tho Thanks! I'm pleased that ... whatever renders this stuff ... actually supports nested lists. > > + > > +This is quite the combinatoric explosion! > > + > > +Fortunately, having this much test coverage makes it easy for XFS > > developers to > > +check the responses of XFS' fsck tools. > > +Since the introduction of the fuzz testing framework, these tests > > have been > > +used to discover incorrect repair code and missing functionality for > > entire > > +classes of metadata objects in ``xfs_repair``. > > +The enhanced testing was used to finalize the deprecation of > > ``xfs_check`` by > > +confirming that ``xfs_repair`` could detect at least as many > > corruptions as > > +the older tool. > > + > > +These tests have been very valuable for ``xfs_scrub`` in the same > > ways -- they > > +allow the online fsck developers to compare online fsck against > > offline fsck, > > +and they enable XFS developers to find deficiencies in the code > > base. > > + > > +Proposed patchsets include > > +`general fuzzer improvements > > +< > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.g > > it/log/?h=fuzzer-improvements>`_, > > +`fuzzing baselines > > +< > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.g > > it/log/?h=fuzz-baseline>`_, > > +and `improvements in fuzz testing comprehensiveness > > +< > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.g > > it/log/?h=more-fuzz-testing>`_. > > + > > +Stress Testing > > +-------------- > > + > > +A unique requirement to online fsck is the ability to operate on a > > filesystem > > +concurrently with regular workloads. > > +Although it is of course impossible to run ``xfs_scrub`` with *zero* > > observable > > +impact on the running system, the online repair code should never > > introduce > > +inconsistencies into the filesystem metadata, and regular workloads > > should > > +never notice resource starvation. > > +To verify that these conditions are being met, fstests has been > > enhanced in > > +the following ways: > > + > > +* For each scrub item type, create a test to exercise checking that > > item type > > + while running ``fsstress``. > > +* For each scrub item type, create a test to exercise repairing that > > item type > > + while running ``fsstress``. > > +* Race ``fsstress`` and ``xfs_scrub -n`` to ensure that checking the > > whole > > + filesystem doesn't cause problems. > > +* Race ``fsstress`` and ``xfs_scrub`` in force-rebuild mode to > > ensure that > > + force-repairing the whole filesystem doesn't cause problems. > > +* Race ``xfs_scrub`` in check and force-repair mode against > > ``fsstress`` while > > + freezing and thawing the filesystem. > > +* Race ``xfs_scrub`` in check and force-repair mode against > > ``fsstress`` while > > + remounting the filesystem read-only and read-write. > > +* The same, but running ``fsx`` instead of ``fsstress``. (Not done > > yet?) > > + > > +Success is defined by the ability to run all of these tests without > > observing > > +any unexpected filesystem shutdowns due to corrupted metadata, > > kernel hang > > +check warnings, or any other sort of mischief. > > Seems reasonable. Other than the one nit, I think this section reads > pretty well. > Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Woo! --D > Allison > > + > > +Proposed patchsets include `general stress testing > > +< > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.g > > it/log/?h=race-scrub-and-mount-state-changes>`_ > > +and the `evolution of existing per-function stress testing > > +< > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.g > > it/log/?h=refactor-scrub-stress>`_. > > >
diff --git a/Documentation/filesystems/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs-online-fsck-design.rst index a03a7b9f0250..d630b6bdbe4a 100644 --- a/Documentation/filesystems/xfs-online-fsck-design.rst +++ b/Documentation/filesystems/xfs-online-fsck-design.rst @@ -563,3 +563,190 @@ functionality. Many of these risks are inherent to software programming. Despite this, it is hoped that this new functionality will prove useful in reducing unexpected downtime. + +3. Testing Plan +=============== + +As stated before, fsck tools have three main goals: + +1. Detect inconsistencies in the metadata; + +2. Eliminate those inconsistencies; and + +3. Minimize further loss of data. + +Demonstrations of correct operation are necessary to build users' confidence +that the software behaves within expectations. +Unfortunately, it was not really feasible to perform regular exhaustive testing +of every aspect of a fsck tool until the introduction of low-cost virtual +machines with high-IOPS storage. +With ample hardware availability in mind, the testing strategy for the online +fsck project involves differential analysis against the existing fsck tools and +systematic testing of every attribute of every type of metadata object. +Testing can be split into four major categories, as discussed below. + +Integrated Testing with fstests +------------------------------- + +The primary goal of any free software QA effort is to make testing as +inexpensive and widespread as possible to maximize the scaling advantages of +community. +In other words, testing should maximize the breadth of filesystem configuration +scenarios and hardware setups. +This improves code quality by enabling the authors of online fsck to find and +fix bugs early, and helps developers of new features to find integration +issues earlier in their development effort. + +The Linux filesystem community shares a common QA testing suite, +`fstests <https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/>`_, for +functional and regression testing. +Even before development work began on online fsck, fstests (when run on XFS) +would run both the ``xfs_check`` and ``xfs_repair -n`` commands on the test and +scratch filesystems between each test. +This provides a level of assurance that the kernel and the fsck tools stay in +alignment about what constitutes consistent metadata. +During development of the online checking code, fstests was modified to run +``xfs_scrub -n`` between each test to ensure that the new checking code +produces the same results as the two existing fsck tools. + +To start development of online repair, fstests was modified to run +``xfs_repair`` to rebuild the filesystem's metadata indices between tests. +This ensures that offline repair does not crash, leave a corrupt filesystem +after it exists, or trigger complaints from the online check. +This also established a baseline for what can and cannot be repaired offline. +To complete the first phase of development of online repair, fstests was +modified to be able to run ``xfs_scrub`` in a "force rebuild" mode. +This enables a comparison of the effectiveness of online repair as compared to +the existing offline repair tools. + +General Fuzz Testing of Metadata Blocks +--------------------------------------- + +XFS benefits greatly from having a very robust debugging tool, ``xfs_db``. + +Before development of online fsck even began, a set of fstests were created +to test the rather common fault that entire metadata blocks get corrupted. +This required the creation of fstests library code that can create a filesystem +containing every possible type of metadata object. +Next, individual test cases were created to create a test filesystem, identify +a single block of a specific type of metadata object, trash it with the +existing ``blocktrash`` command in ``xfs_db``, and test the reaction of a +particular metadata validation strategy. + +This earlier test suite enabled XFS developers to test the ability of the +in-kernel validation functions and the ability of the offline fsck tool to +detect and eliminate the inconsistent metadata. +This part of the test suite was extended to cover online fsck in exactly the +same manner. + +In other words, for a given fstests filesystem configuration: + +* For each metadata object existing on the filesystem: + + * Write garbage to it + + * Test the reactions of: + + 1. The kernel verifiers to stop obviously bad metadata + 2. Offline repair (``xfs_repair``) to detect and fix + 3. Online repair (``xfs_scrub``) to detect and fix + +Targeted Fuzz Testing of Metadata Records +----------------------------------------- + +A quick conversation with the other XFS developers revealed that the existing +test infrastructure could be extended to provide a much more powerful +facility: targeted fuzz testing of every metadata field of every metadata +object in the filesystem. +``xfs_db`` can modify every field of every metadata structure in every +block in the filesystem to simulate the effects of memory corruption and +software bugs. +Given that fstests already contains the ability to create a filesystem +containing every metadata format known to the filesystem, ``xfs_db`` can be +used to perform exhaustive fuzz testing! + +For a given fstests filesystem configuration: + +* For each metadata object existing on the filesystem... + + * For each record inside that metadata object... + + * For each field inside that record... + + * For each conceivable type of transformation that can be applied to a bit field... + + 1. Clear all bits + 2. Set all bits + 3. Toggle the most significant bit + 4. Toggle the middle bit + 5. Toggle the least significant bit + 6. Add a small quantity + 7. Subtract a small quantity + 8. Randomize the contents + + * ...test the reactions of: + + 1. The kernel verifiers to stop obviously bad metadata + 2. Offline checking (``xfs_repair -n``) + 3. Offline repair (``xfs_repair``) + 4. Online checking (``xfs_scrub -n``) + 5. Online repair (``xfs_scrub``) + 6. Both repair tools (``xfs_scrub`` and then ``xfs_repair`` if online repair doesn't succeed) + +This is quite the combinatoric explosion! + +Fortunately, having this much test coverage makes it easy for XFS developers to +check the responses of XFS' fsck tools. +Since the introduction of the fuzz testing framework, these tests have been +used to discover incorrect repair code and missing functionality for entire +classes of metadata objects in ``xfs_repair``. +The enhanced testing was used to finalize the deprecation of ``xfs_check`` by +confirming that ``xfs_repair`` could detect at least as many corruptions as +the older tool. + +These tests have been very valuable for ``xfs_scrub`` in the same ways -- they +allow the online fsck developers to compare online fsck against offline fsck, +and they enable XFS developers to find deficiencies in the code base. + +Proposed patchsets include +`general fuzzer improvements +<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fuzzer-improvements>`_, +`fuzzing baselines +<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=fuzz-baseline>`_, +and `improvements in fuzz testing comprehensiveness +<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=more-fuzz-testing>`_. + +Stress Testing +-------------- + +A unique requirement to online fsck is the ability to operate on a filesystem +concurrently with regular workloads. +Although it is of course impossible to run ``xfs_scrub`` with *zero* observable +impact on the running system, the online repair code should never introduce +inconsistencies into the filesystem metadata, and regular workloads should +never notice resource starvation. +To verify that these conditions are being met, fstests has been enhanced in +the following ways: + +* For each scrub item type, create a test to exercise checking that item type + while running ``fsstress``. +* For each scrub item type, create a test to exercise repairing that item type + while running ``fsstress``. +* Race ``fsstress`` and ``xfs_scrub -n`` to ensure that checking the whole + filesystem doesn't cause problems. +* Race ``fsstress`` and ``xfs_scrub`` in force-rebuild mode to ensure that + force-repairing the whole filesystem doesn't cause problems. +* Race ``xfs_scrub`` in check and force-repair mode against ``fsstress`` while + freezing and thawing the filesystem. +* Race ``xfs_scrub`` in check and force-repair mode against ``fsstress`` while + remounting the filesystem read-only and read-write. +* The same, but running ``fsx`` instead of ``fsstress``. (Not done yet?) + +Success is defined by the ability to run all of these tests without observing +any unexpected filesystem shutdowns due to corrupted metadata, kernel hang +check warnings, or any other sort of mischief. + +Proposed patchsets include `general stress testing +<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=race-scrub-and-mount-state-changes>`_ +and the `evolution of existing per-function stress testing +<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/log/?h=refactor-scrub-stress>`_.