Message ID | 1504638680-25682-11-git-send-email-amir73il@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Sep 05, 2017 at 10:11:17PM +0300, Amir Goldstein wrote: > Cherry-picked the test from commit 70d41e17164b > in Josef Bacik's fstests tree (https://github.com/josefbacik/fstests). > Quoting from Josef's commit message: > > The test just runs some ops and exits, then finds all of the good buffers > in the directory we provided and: > - replays up to the mark given > - mounts the file system and compares the md5sum > - unmounts and fsck's to check for metadata integrity > > dm-log-writes will pretend to do discard and the replay-log tool will > replay it properly depending on the underlying device, either by writing > 0's or actually calling the discard ioctl, so I've enabled discard in the > test for maximum fun. > > [Amir:] > - Removed unneeded _test_falloc_support dynamic FSX_OPTS > - Fold repetitions into for loops > - Added place holders for using constant random seeds > - Add pre umount checkpint > - Add test to new 'replay' group > - Address review comments by Eryu Guan > > Cc: Josef Bacik <jbacik@fb.com> > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- > tests/generic/500 | 135 ++++++++++++++++++++++++++++++++++++++++++++++++++ > tests/generic/500.out | 2 + > tests/generic/group | 1 + > 3 files changed, 138 insertions(+) > create mode 100755 tests/generic/500 > create mode 100644 tests/generic/500.out > > diff --git a/tests/generic/500 b/tests/generic/500 > new file mode 100755 > index 0000000..82f7a93 > --- /dev/null > +++ b/tests/generic/500 > @@ -0,0 +1,135 @@ > +#! /bin/bash > +# FS QA Test No. 500 > +# > +# Run fsx with log writes to verify power fail safeness. > +# > +#----------------------------------------------------------------------- > +# Copyright (c) 2015 Facebook. All Rights Reserved. > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +#----------------------------------------------------------------------- > +# > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +status=1 # failure is the default! > + > +_cleanup() > +{ > + _log_writes_cleanup > +} > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > +. ./common/dmlogwrites > + > +# real QA test starts here > +_supported_fs generic > +_supported_os Linux > +_require_test > +_require_scratch_nocheck > +_require_log_writes > + > +rm -f $seqres.full > + > +check_files() > +{ > + local name=$1 > + > + # Now look for our files > + for i in $(find $SANITY_DIR -type f | grep $name | grep mark) > + do > + local filename=$(basename $i) > + local mark="${filename##*.}" > + echo "checking $filename" >> $seqres.full > + _log_writes_replay_log $filename > + _scratch_mount > + local expected_md5=$(_md5_checksum $i) > + local md5=$(_md5_checksum $SCRATCH_MNT/$name) > + [ "${md5}" != "${expected_md5}" ] && _fail "$filename md5sum mismatched" > + _scratch_unmount > + _check_scratch_fs > + done > +} > + > +SANITY_DIR=$TEST_DIR/fsxtests > +rm -rf $SANITY_DIR > +mkdir $SANITY_DIR > + > +# Create the log > +_log_writes_init > + > +_log_writes_mkfs >> $seqres.full 2>&1 > + > +# Log writes emulates discard support, turn it on for maximum crying. > +_log_writes_mount -o discard > + > +NUM_FILES=4 > +NUM_OPS=200 > +FSX_OPTS="-N $NUM_OPS -d -P $SANITY_DIR -i $LOGWRITES_DMDEV" > +# Set random seeds for fsx runs (0 for timestamp + pid) > +seeds=(0 0 0 0) Hmm, is this array necessary? Why not pass 0 to -S directly in the commandline? And generic/502 follows the same pattern. > +# Run fsx for a while > +for j in `seq 0 $((NUM_FILES-1))` > +do > + run_check $here/ltp/fsx $FSX_OPTS -S ${seeds[$j]} -j $j $SCRATCH_MNT/testfile$j & > +done > +wait Thanks, Eryu -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 7, 2017 at 10:50 AM, Eryu Guan <eguan@redhat.com> wrote: > On Tue, Sep 05, 2017 at 10:11:17PM +0300, Amir Goldstein wrote: >> Cherry-picked the test from commit 70d41e17164b >> in Josef Bacik's fstests tree (https://github.com/josefbacik/fstests). >> Quoting from Josef's commit message: >> >> The test just runs some ops and exits, then finds all of the good buffers >> in the directory we provided and: >> - replays up to the mark given >> - mounts the file system and compares the md5sum >> - unmounts and fsck's to check for metadata integrity >> >> dm-log-writes will pretend to do discard and the replay-log tool will >> replay it properly depending on the underlying device, either by writing >> 0's or actually calling the discard ioctl, so I've enabled discard in the >> test for maximum fun. >> >> [Amir:] >> - Removed unneeded _test_falloc_support dynamic FSX_OPTS >> - Fold repetitions into for loops >> - Added place holders for using constant random seeds >> - Add pre umount checkpint >> - Add test to new 'replay' group >> - Address review comments by Eryu Guan >> >> Cc: Josef Bacik <jbacik@fb.com> >> Signed-off-by: Amir Goldstein <amir73il@gmail.com> >> --- >> tests/generic/500 | 135 ++++++++++++++++++++++++++++++++++++++++++++++++++ >> tests/generic/500.out | 2 + >> tests/generic/group | 1 + >> 3 files changed, 138 insertions(+) >> create mode 100755 tests/generic/500 >> create mode 100644 tests/generic/500.out >> >> diff --git a/tests/generic/500 b/tests/generic/500 >> new file mode 100755 >> index 0000000..82f7a93 >> --- /dev/null >> +++ b/tests/generic/500 >> @@ -0,0 +1,135 @@ >> +#! /bin/bash >> +# FS QA Test No. 500 >> +# >> +# Run fsx with log writes to verify power fail safeness. >> +# >> +#----------------------------------------------------------------------- >> +# Copyright (c) 2015 Facebook. All Rights Reserved. >> +# >> +# This program is free software; you can redistribute it and/or >> +# modify it under the terms of the GNU General Public License as >> +# published by the Free Software Foundation. >> +# >> +# This program is distributed in the hope that it would be useful, >> +# but WITHOUT ANY WARRANTY; without even the implied warranty of >> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> +# GNU General Public License for more details. >> +# >> +# You should have received a copy of the GNU General Public License >> +# along with this program; if not, write the Free Software Foundation, >> +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA >> +#----------------------------------------------------------------------- >> +# >> + >> +seq=`basename $0` >> +seqres=$RESULT_DIR/$seq >> +echo "QA output created by $seq" >> + >> +here=`pwd` >> +status=1 # failure is the default! >> + >> +_cleanup() >> +{ >> + _log_writes_cleanup >> +} >> +trap "_cleanup; exit \$status" 0 1 2 3 15 >> + >> +# get standard environment, filters and checks >> +. ./common/rc >> +. ./common/filter >> +. ./common/dmlogwrites >> + >> +# real QA test starts here >> +_supported_fs generic >> +_supported_os Linux >> +_require_test >> +_require_scratch_nocheck >> +_require_log_writes >> + >> +rm -f $seqres.full >> + >> +check_files() >> +{ >> + local name=$1 >> + >> + # Now look for our files >> + for i in $(find $SANITY_DIR -type f | grep $name | grep mark) >> + do >> + local filename=$(basename $i) >> + local mark="${filename##*.}" >> + echo "checking $filename" >> $seqres.full >> + _log_writes_replay_log $filename >> + _scratch_mount >> + local expected_md5=$(_md5_checksum $i) >> + local md5=$(_md5_checksum $SCRATCH_MNT/$name) >> + [ "${md5}" != "${expected_md5}" ] && _fail "$filename md5sum mismatched" >> + _scratch_unmount >> + _check_scratch_fs >> + done >> +} >> + >> +SANITY_DIR=$TEST_DIR/fsxtests >> +rm -rf $SANITY_DIR >> +mkdir $SANITY_DIR >> + >> +# Create the log >> +_log_writes_init >> + >> +_log_writes_mkfs >> $seqres.full 2>&1 >> + >> +# Log writes emulates discard support, turn it on for maximum crying. >> +_log_writes_mount -o discard >> + >> +NUM_FILES=4 >> +NUM_OPS=200 >> +FSX_OPTS="-N $NUM_OPS -d -P $SANITY_DIR -i $LOGWRITES_DMDEV" >> +# Set random seeds for fsx runs (0 for timestamp + pid) >> +seeds=(0 0 0 0) > > Hmm, is this array necessary? Why not pass 0 to -S directly in the > commandline? And generic/502 follows the same pattern. > The reason for the array is trying to follow your guidelines for running the test: - Normally test runs with random seed (for which the array is not needed) and every fsx process prints out the seed to the full log - When test detects a failure, you want to retry the test with same seed numbers, but those are different seed numbers for every fsx process - So the array is here to make setting those presets possible -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 07, 2017 at 11:50:25AM +0300, Amir Goldstein wrote: > On Thu, Sep 7, 2017 at 10:50 AM, Eryu Guan <eguan@redhat.com> wrote: > > On Tue, Sep 05, 2017 at 10:11:17PM +0300, Amir Goldstein wrote: > >> Cherry-picked the test from commit 70d41e17164b > >> in Josef Bacik's fstests tree (https://github.com/josefbacik/fstests). > >> Quoting from Josef's commit message: > >> > >> The test just runs some ops and exits, then finds all of the good buffers > >> in the directory we provided and: > >> - replays up to the mark given > >> - mounts the file system and compares the md5sum > >> - unmounts and fsck's to check for metadata integrity > >> > >> dm-log-writes will pretend to do discard and the replay-log tool will > >> replay it properly depending on the underlying device, either by writing > >> 0's or actually calling the discard ioctl, so I've enabled discard in the > >> test for maximum fun. > >> > >> [Amir:] > >> - Removed unneeded _test_falloc_support dynamic FSX_OPTS > >> - Fold repetitions into for loops > >> - Added place holders for using constant random seeds > >> - Add pre umount checkpint > >> - Add test to new 'replay' group > >> - Address review comments by Eryu Guan > >> > >> Cc: Josef Bacik <jbacik@fb.com> > >> Signed-off-by: Amir Goldstein <amir73il@gmail.com> > >> --- > >> tests/generic/500 | 135 ++++++++++++++++++++++++++++++++++++++++++++++++++ > >> tests/generic/500.out | 2 + > >> tests/generic/group | 1 + > >> 3 files changed, 138 insertions(+) > >> create mode 100755 tests/generic/500 > >> create mode 100644 tests/generic/500.out > >> > >> diff --git a/tests/generic/500 b/tests/generic/500 > >> new file mode 100755 > >> index 0000000..82f7a93 > >> --- /dev/null > >> +++ b/tests/generic/500 > >> @@ -0,0 +1,135 @@ > >> +#! /bin/bash > >> +# FS QA Test No. 500 > >> +# > >> +# Run fsx with log writes to verify power fail safeness. > >> +# > >> +#----------------------------------------------------------------------- > >> +# Copyright (c) 2015 Facebook. All Rights Reserved. > >> +# > >> +# This program is free software; you can redistribute it and/or > >> +# modify it under the terms of the GNU General Public License as > >> +# published by the Free Software Foundation. > >> +# > >> +# This program is distributed in the hope that it would be useful, > >> +# but WITHOUT ANY WARRANTY; without even the implied warranty of > >> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > >> +# GNU General Public License for more details. > >> +# > >> +# You should have received a copy of the GNU General Public License > >> +# along with this program; if not, write the Free Software Foundation, > >> +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > >> +#----------------------------------------------------------------------- > >> +# > >> + > >> +seq=`basename $0` > >> +seqres=$RESULT_DIR/$seq > >> +echo "QA output created by $seq" > >> + > >> +here=`pwd` > >> +status=1 # failure is the default! > >> + > >> +_cleanup() > >> +{ > >> + _log_writes_cleanup > >> +} > >> +trap "_cleanup; exit \$status" 0 1 2 3 15 > >> + > >> +# get standard environment, filters and checks > >> +. ./common/rc > >> +. ./common/filter > >> +. ./common/dmlogwrites > >> + > >> +# real QA test starts here > >> +_supported_fs generic > >> +_supported_os Linux > >> +_require_test > >> +_require_scratch_nocheck > >> +_require_log_writes > >> + > >> +rm -f $seqres.full > >> + > >> +check_files() > >> +{ > >> + local name=$1 > >> + > >> + # Now look for our files > >> + for i in $(find $SANITY_DIR -type f | grep $name | grep mark) > >> + do > >> + local filename=$(basename $i) > >> + local mark="${filename##*.}" > >> + echo "checking $filename" >> $seqres.full > >> + _log_writes_replay_log $filename > >> + _scratch_mount > >> + local expected_md5=$(_md5_checksum $i) > >> + local md5=$(_md5_checksum $SCRATCH_MNT/$name) > >> + [ "${md5}" != "${expected_md5}" ] && _fail "$filename md5sum mismatched" > >> + _scratch_unmount > >> + _check_scratch_fs > >> + done > >> +} > >> + > >> +SANITY_DIR=$TEST_DIR/fsxtests > >> +rm -rf $SANITY_DIR > >> +mkdir $SANITY_DIR > >> + > >> +# Create the log > >> +_log_writes_init > >> + > >> +_log_writes_mkfs >> $seqres.full 2>&1 > >> + > >> +# Log writes emulates discard support, turn it on for maximum crying. > >> +_log_writes_mount -o discard > >> + > >> +NUM_FILES=4 > >> +NUM_OPS=200 > >> +FSX_OPTS="-N $NUM_OPS -d -P $SANITY_DIR -i $LOGWRITES_DMDEV" > >> +# Set random seeds for fsx runs (0 for timestamp + pid) > >> +seeds=(0 0 0 0) > > > > Hmm, is this array necessary? Why not pass 0 to -S directly in the > > commandline? And generic/502 follows the same pattern. > > > > The reason for the array is trying to follow your guidelines for running > the test: > - Normally test runs with random seed (for which the array is not needed) > and every fsx process prints out the seed to the full log > - When test detects a failure, you want to retry the test with same seed > numbers, but those are different seed numbers for every fsx process > - So the array is here to make setting those presets possible Yeah, an array makes debug easier. Then I think a comment about the debug usage of the array would be sufficient. Thanks! Eryu -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 7, 2017 at 11:55 AM, Eryu Guan <eguan@redhat.com> wrote: >> >> +NUM_FILES=4 >> >> +NUM_OPS=200 >> >> +FSX_OPTS="-N $NUM_OPS -d -P $SANITY_DIR -i $LOGWRITES_DMDEV" >> >> +# Set random seeds for fsx runs (0 for timestamp + pid) >> >> +seeds=(0 0 0 0) >> > >> > Hmm, is this array necessary? Why not pass 0 to -S directly in the >> > commandline? And generic/502 follows the same pattern. >> > >> >> The reason for the array is trying to follow your guidelines for running >> the test: >> - Normally test runs with random seed (for which the array is not needed) >> and every fsx process prints out the seed to the full log >> - When test detects a failure, you want to retry the test with same seed >> numbers, but those are different seed numbers for every fsx process >> - So the array is here to make setting those presets possible > > Yeah, an array makes debug easier. Then I think a comment about the > debug usage of the array would be sufficient. Thanks! > FYI, I removed the array from generic/502, because it was never useful during debugging. I guess the randomness stems mostly from timing of forking the fsx processes in that test. I reduced NUM_OPS in cloned files test to 10 in v4, because after a few holes and writes the clone is no longer a clone anyway. I already found a seed value to reproduce a failure of generic/502 on xfs failures are still random, but on spinning disk, I get them every 2 test runs. Amir. -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Sep 5, 2017 at 10:11 PM, Amir Goldstein <amir73il@gmail.com> wrote: > Cherry-picked the test from commit 70d41e17164b > in Josef Bacik's fstests tree (https://github.com/josefbacik/fstests). > Quoting from Josef's commit message: > > The test just runs some ops and exits, then finds all of the good buffers > in the directory we provided and: > - replays up to the mark given > - mounts the file system and compares the md5sum > - unmounts and fsck's to check for metadata integrity > > dm-log-writes will pretend to do discard and the replay-log tool will > replay it properly depending on the underlying device, either by writing > 0's or actually calling the discard ioctl, so I've enabled discard in the > test for maximum fun. > > [Amir:] > - Removed unneeded _test_falloc_support dynamic FSX_OPTS > - Fold repetitions into for loops > - Added place holders for using constant random seeds > - Add pre umount checkpint > - Add test to new 'replay' group > - Address review comments by Eryu Guan > > Cc: Josef Bacik <jbacik@fb.com> > Signed-off-by: Amir Goldstein <amir73il@gmail.com> Josef, As you know, this test is now merged to xfstest as generic/455. I have been running the test for a while on xfs and it occasionally reports inconsistencies which I try to investigate. In some of the reports, it appears that dm-log-writes may be exhibiting a reliability issue (see below). > --- > tests/generic/500 | 135 ++++++++++++++++++++++++++++++++++++++++++++++++++ > tests/generic/500.out | 2 + > tests/generic/group | 1 + > 3 files changed, 138 insertions(+) > create mode 100755 tests/generic/500 > create mode 100644 tests/generic/500.out > > diff --git a/tests/generic/500 b/tests/generic/500 > new file mode 100755 > index 0000000..82f7a93 > --- /dev/null > +++ b/tests/generic/500 > @@ -0,0 +1,135 @@ > +#! /bin/bash > +# FS QA Test No. 500 > +# > +# Run fsx with log writes to verify power fail safeness. > +# > +#----------------------------------------------------------------------- > +# Copyright (c) 2015 Facebook. All Rights Reserved. > +# > +# This program is free software; you can redistribute it and/or > +# modify it under the terms of the GNU General Public License as > +# published by the Free Software Foundation. > +# > +# This program is distributed in the hope that it would be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program; if not, write the Free Software Foundation, > +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA > +#----------------------------------------------------------------------- > +# > + > +seq=`basename $0` > +seqres=$RESULT_DIR/$seq > +echo "QA output created by $seq" > + > +here=`pwd` > +status=1 # failure is the default! > + > +_cleanup() > +{ > + _log_writes_cleanup > +} > +trap "_cleanup; exit \$status" 0 1 2 3 15 > + > +# get standard environment, filters and checks > +. ./common/rc > +. ./common/filter > +. ./common/dmlogwrites > + > +# real QA test starts here > +_supported_fs generic > +_supported_os Linux > +_require_test > +_require_scratch_nocheck > +_require_log_writes > + > +rm -f $seqres.full > + > +check_files() > +{ > + local name=$1 > + > + # Now look for our files > + for i in $(find $SANITY_DIR -type f | grep $name | grep mark) > + do > + local filename=$(basename $i) > + local mark="${filename##*.}" > + echo "checking $filename" >> $seqres.full > + _log_writes_replay_log $filename > + _scratch_mount > + local expected_md5=$(_md5_checksum $i) > + local md5=$(_md5_checksum $SCRATCH_MNT/$name) > + [ "${md5}" != "${expected_md5}" ] && _fail "$filename md5sum mismatched" One time, the test reported md5 mismatch on a file, but when I replayed the log to the same mark I found that md5 of the file is correct compared to the 'snapshot' file in test partition. > + _scratch_unmount > + _check_scratch_fs > + done > +} > + > +SANITY_DIR=$TEST_DIR/fsxtests > +rm -rf $SANITY_DIR > +mkdir $SANITY_DIR > + > +# Create the log > +_log_writes_init > + > +_log_writes_mkfs >> $seqres.full 2>&1 > + > +# Log writes emulates discard support, turn it on for maximum crying. > +_log_writes_mount -o discard > + > +NUM_FILES=4 > +NUM_OPS=200 > +FSX_OPTS="-N $NUM_OPS -d -P $SANITY_DIR -i $LOGWRITES_DMDEV" > +# Set random seeds for fsx runs (0 for timestamp + pid) > +seeds=(0 0 0 0) > +# Run fsx for a while > +for j in `seq 0 $((NUM_FILES-1))` > +do > + run_check $here/ltp/fsx $FSX_OPTS -S ${seeds[$j]} -j $j $SCRATCH_MNT/testfile$j & > +done > +wait > + > +test_md5=() > +for j in `seq 0 $((NUM_FILES-1))` > +do > + test_md5[$j]=$(_md5_checksum $SCRATCH_MNT/testfile$j) > +done > + > +# Unmount the scratch dir and tear down the log writes target > +_log_writes_mark last > +_log_writes_unmount > +_log_writes_mark end > +_log_writes_remove > +_check_scratch_fs > + Another time xfs_repair complained about dirty log right after _log_writes_remove and I wasn't able to seek to the "end" mark. Log entries were valid a few entries after the "last" mark. This leads me to believe that perhaps dm-log-writes doesn't flush all its pending log io before remove? Anyway, I added a "sync $LOGWRITES_DEV" call inside _log_writes_remove. Not sure if that helps or if that is required before removing the target? I will report if I see that the problem persists. Thought? suggestions? Thanks, Amir. -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Amir, Just to throw in what I believe I've found about dm-log-writes (though Josef would know more about this as it's just what I've concluded after looking at the code): dm-log-writes logs at IO completion, meaning disks that ignore flush operations could exhibit bugs where there are none, and dm-log-writes does synchronous marks though this may not line up with the stream of IO operations (due to buffering) unless you just did a sync. The CrashMonkey team wanted to make something similar to the "mark" operation dm-log-writes has and we concluded that the only place we could guarantee a mark would line up with the stream of IO operations the user had performed was right after a call to sync as that would force cached updates to disk. If you call mark without a sync, then you could insert a mark after a write(2) returns, but before the delayed allocation for that write(2) actually allocates blocks and changes the extent tree on disk. On Mon, Nov 27, 2017 at 3:56 AM, Amir Goldstein <amir73il@gmail.com> wrote: > On Tue, Sep 5, 2017 at 10:11 PM, Amir Goldstein <amir73il@gmail.com> wrote: >> Cherry-picked the test from commit 70d41e17164b >> in Josef Bacik's fstests tree (https://github.com/josefbacik/fstests). >> Quoting from Josef's commit message: >> >> The test just runs some ops and exits, then finds all of the good buffers >> in the directory we provided and: >> - replays up to the mark given >> - mounts the file system and compares the md5sum >> - unmounts and fsck's to check for metadata integrity >> >> dm-log-writes will pretend to do discard and the replay-log tool will >> replay it properly depending on the underlying device, either by writing >> 0's or actually calling the discard ioctl, so I've enabled discard in the >> test for maximum fun. >> >> [Amir:] >> - Removed unneeded _test_falloc_support dynamic FSX_OPTS >> - Fold repetitions into for loops >> - Added place holders for using constant random seeds >> - Add pre umount checkpint >> - Add test to new 'replay' group >> - Address review comments by Eryu Guan >> >> Cc: Josef Bacik <jbacik@fb.com> >> Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > > Josef, > > As you know, this test is now merged to xfstest as generic/455. > I have been running the test for a while on xfs and it occasionally > reports inconsistencies which I try to investigate. > > In some of the reports, it appears that dm-log-writes may be exhibiting > a reliability issue (see below). > >> --- >> tests/generic/500 | 135 ++++++++++++++++++++++++++++++++++++++++++++++++++ >> tests/generic/500.out | 2 + >> tests/generic/group | 1 + >> 3 files changed, 138 insertions(+) >> create mode 100755 tests/generic/500 >> create mode 100644 tests/generic/500.out >> >> diff --git a/tests/generic/500 b/tests/generic/500 >> new file mode 100755 >> index 0000000..82f7a93 >> --- /dev/null >> +++ b/tests/generic/500 >> @@ -0,0 +1,135 @@ >> +#! /bin/bash >> +# FS QA Test No. 500 >> +# >> +# Run fsx with log writes to verify power fail safeness. >> +# >> +#----------------------------------------------------------------------- >> +# Copyright (c) 2015 Facebook. All Rights Reserved. >> +# >> +# This program is free software; you can redistribute it and/or >> +# modify it under the terms of the GNU General Public License as >> +# published by the Free Software Foundation. >> +# >> +# This program is distributed in the hope that it would be useful, >> +# but WITHOUT ANY WARRANTY; without even the implied warranty of >> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> +# GNU General Public License for more details. >> +# >> +# You should have received a copy of the GNU General Public License >> +# along with this program; if not, write the Free Software Foundation, >> +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA >> +#----------------------------------------------------------------------- >> +# >> + >> +seq=`basename $0` >> +seqres=$RESULT_DIR/$seq >> +echo "QA output created by $seq" >> + >> +here=`pwd` >> +status=1 # failure is the default! >> + >> +_cleanup() >> +{ >> + _log_writes_cleanup >> +} >> +trap "_cleanup; exit \$status" 0 1 2 3 15 >> + >> +# get standard environment, filters and checks >> +. ./common/rc >> +. ./common/filter >> +. ./common/dmlogwrites >> + >> +# real QA test starts here >> +_supported_fs generic >> +_supported_os Linux >> +_require_test >> +_require_scratch_nocheck >> +_require_log_writes >> + >> +rm -f $seqres.full >> + >> +check_files() >> +{ >> + local name=$1 >> + >> + # Now look for our files >> + for i in $(find $SANITY_DIR -type f | grep $name | grep mark) >> + do >> + local filename=$(basename $i) >> + local mark="${filename##*.}" >> + echo "checking $filename" >> $seqres.full >> + _log_writes_replay_log $filename >> + _scratch_mount >> + local expected_md5=$(_md5_checksum $i) >> + local md5=$(_md5_checksum $SCRATCH_MNT/$name) >> + [ "${md5}" != "${expected_md5}" ] && _fail "$filename md5sum mismatched" > > One time, the test reported md5 mismatch on a file, but when I replayed > the log to the same mark I found that md5 of the file is correct compared to the > 'snapshot' file in test partition. > >> + _scratch_unmount >> + _check_scratch_fs >> + done >> +} >> + >> +SANITY_DIR=$TEST_DIR/fsxtests >> +rm -rf $SANITY_DIR >> +mkdir $SANITY_DIR >> + >> +# Create the log >> +_log_writes_init >> + >> +_log_writes_mkfs >> $seqres.full 2>&1 >> + >> +# Log writes emulates discard support, turn it on for maximum crying. >> +_log_writes_mount -o discard >> + >> +NUM_FILES=4 >> +NUM_OPS=200 >> +FSX_OPTS="-N $NUM_OPS -d -P $SANITY_DIR -i $LOGWRITES_DMDEV" >> +# Set random seeds for fsx runs (0 for timestamp + pid) >> +seeds=(0 0 0 0) >> +# Run fsx for a while >> +for j in `seq 0 $((NUM_FILES-1))` >> +do >> + run_check $here/ltp/fsx $FSX_OPTS -S ${seeds[$j]} -j $j $SCRATCH_MNT/testfile$j & >> +done >> +wait >> + >> +test_md5=() >> +for j in `seq 0 $((NUM_FILES-1))` >> +do >> + test_md5[$j]=$(_md5_checksum $SCRATCH_MNT/testfile$j) >> +done >> + >> +# Unmount the scratch dir and tear down the log writes target >> +_log_writes_mark last >> +_log_writes_unmount >> +_log_writes_mark end >> +_log_writes_remove >> +_check_scratch_fs >> + > > Another time xfs_repair complained about dirty log right after > _log_writes_remove > and I wasn't able to seek to the "end" mark. Log entries were valid a > few entries > after the "last" mark. > > This leads me to believe that perhaps dm-log-writes doesn't flush all > its pending > log io before remove? Anyway, I added a "sync $LOGWRITES_DEV" call inside > _log_writes_remove. Not sure if that helps or if that is required > before removing the > target? > > I will report if I see that the problem persists. > Thought? suggestions? > > Thanks, > Amir. > -- > To unsubscribe from this list: send the line "unsubscribe fstests" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 27, 2017 at 08:23:49AM -0600, Ashlie Martinez wrote: > Amir, > > Just to throw in what I believe I've found about dm-log-writes (though > Josef would know more about this as it's just what I've concluded > after looking at the code): dm-log-writes logs at IO completion, > meaning disks that ignore flush operations could exhibit bugs where > there are none, and dm-log-writes does synchronous marks though this > may not line up with the stream of IO operations (due to buffering) > unless you just did a sync. The CrashMonkey team wanted to make > something similar to the "mark" operation dm-log-writes has and we > concluded that the only place we could guarantee a mark would line up > with the stream of IO operations the user had performed was right > after a call to sync as that would force cached updates to disk. If > you call mark without a sync, then you could insert a mark after a > write(2) returns, but before the delayed allocation for that write(2) > actually allocates blocks and changes the extent tree on disk. > Yeah dm-log-writes gladly hands you the loaded gun, it's your job to not shoot yourself with it. This fsx test is specifically set up to only do the mark _after_ a successful call of fsync(), which means that the mark will be after where we expect our data to be consistent. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe fstests" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/tests/generic/500 b/tests/generic/500 new file mode 100755 index 0000000..82f7a93 --- /dev/null +++ b/tests/generic/500 @@ -0,0 +1,135 @@ +#! /bin/bash +# FS QA Test No. 500 +# +# Run fsx with log writes to verify power fail safeness. +# +#----------------------------------------------------------------------- +# Copyright (c) 2015 Facebook. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#----------------------------------------------------------------------- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +status=1 # failure is the default! + +_cleanup() +{ + _log_writes_cleanup +} +trap "_cleanup; exit \$status" 0 1 2 3 15 + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/dmlogwrites + +# real QA test starts here +_supported_fs generic +_supported_os Linux +_require_test +_require_scratch_nocheck +_require_log_writes + +rm -f $seqres.full + +check_files() +{ + local name=$1 + + # Now look for our files + for i in $(find $SANITY_DIR -type f | grep $name | grep mark) + do + local filename=$(basename $i) + local mark="${filename##*.}" + echo "checking $filename" >> $seqres.full + _log_writes_replay_log $filename + _scratch_mount + local expected_md5=$(_md5_checksum $i) + local md5=$(_md5_checksum $SCRATCH_MNT/$name) + [ "${md5}" != "${expected_md5}" ] && _fail "$filename md5sum mismatched" + _scratch_unmount + _check_scratch_fs + done +} + +SANITY_DIR=$TEST_DIR/fsxtests +rm -rf $SANITY_DIR +mkdir $SANITY_DIR + +# Create the log +_log_writes_init + +_log_writes_mkfs >> $seqres.full 2>&1 + +# Log writes emulates discard support, turn it on for maximum crying. +_log_writes_mount -o discard + +NUM_FILES=4 +NUM_OPS=200 +FSX_OPTS="-N $NUM_OPS -d -P $SANITY_DIR -i $LOGWRITES_DMDEV" +# Set random seeds for fsx runs (0 for timestamp + pid) +seeds=(0 0 0 0) +# Run fsx for a while +for j in `seq 0 $((NUM_FILES-1))` +do + run_check $here/ltp/fsx $FSX_OPTS -S ${seeds[$j]} -j $j $SCRATCH_MNT/testfile$j & +done +wait + +test_md5=() +for j in `seq 0 $((NUM_FILES-1))` +do + test_md5[$j]=$(_md5_checksum $SCRATCH_MNT/testfile$j) +done + +# Unmount the scratch dir and tear down the log writes target +_log_writes_mark last +_log_writes_unmount +_log_writes_mark end +_log_writes_remove +_check_scratch_fs + +# check pre umount +echo "checking pre umount" >> $seqres.full +_log_writes_replay_log last +_scratch_mount +_scratch_unmount +_check_scratch_fs + +for j in `seq 0 $((NUM_FILES-1))` +do + check_files testfile$j +done + +# Check the end +echo "checking post umount" >> $seqres.full +_log_writes_replay_log end +_scratch_mount +for j in `seq 0 $((NUM_FILES-1))` +do + md5=$(_md5_checksum $SCRATCH_MNT/testfile$j) + [ "${md5}" != "${test_md5[$j]}" ] && _fail "testfile$j end md5sum mismatched" +done +_scratch_unmount +_check_scratch_fs + +echo "Silence is golden" +status=0 +exit + diff --git a/tests/generic/500.out b/tests/generic/500.out new file mode 100644 index 0000000..883b2ca --- /dev/null +++ b/tests/generic/500.out @@ -0,0 +1,2 @@ +QA output created by 500 +Silence is golden diff --git a/tests/generic/group b/tests/generic/group index 1894de0..c857153 100644 --- a/tests/generic/group +++ b/tests/generic/group @@ -457,3 +457,4 @@ 452 auto quick 453 auto quick dir 454 auto quick attr +500 auto log replay
Cherry-picked the test from commit 70d41e17164b in Josef Bacik's fstests tree (https://github.com/josefbacik/fstests). Quoting from Josef's commit message: The test just runs some ops and exits, then finds all of the good buffers in the directory we provided and: - replays up to the mark given - mounts the file system and compares the md5sum - unmounts and fsck's to check for metadata integrity dm-log-writes will pretend to do discard and the replay-log tool will replay it properly depending on the underlying device, either by writing 0's or actually calling the discard ioctl, so I've enabled discard in the test for maximum fun. [Amir:] - Removed unneeded _test_falloc_support dynamic FSX_OPTS - Fold repetitions into for loops - Added place holders for using constant random seeds - Add pre umount checkpint - Add test to new 'replay' group - Address review comments by Eryu Guan Cc: Josef Bacik <jbacik@fb.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> --- tests/generic/500 | 135 ++++++++++++++++++++++++++++++++++++++++++++++++++ tests/generic/500.out | 2 + tests/generic/group | 1 + 3 files changed, 138 insertions(+) create mode 100755 tests/generic/500 create mode 100644 tests/generic/500.out