diff mbox

[v3,10/13] fstests: crash consistency fsx test using dm-log-writes

Message ID 20171127150439.i4tpt4yhooxnoaiz@destiny (mailing list archive)
State New, archived
Headers show

Commit Message

Josef Bacik Nov. 27, 2017, 3:04 p.m. UTC
On Mon, Nov 27, 2017 at 11:56:58AM +0200, Amir Goldstein wrote:
> On Tue, Sep 5, 2017 at 10:11 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> > Cherry-picked the test from commit 70d41e17164b
> > in Josef Bacik's fstests tree (https://github.com/josefbacik/fstests).
> > Quoting from Josef's commit message:
> >
> >   The test just runs some ops and exits, then finds all of the good buffers
> >   in the directory we provided and:
> >   - replays up to the mark given
> >   - mounts the file system and compares the md5sum
> >   - unmounts and fsck's to check for metadata integrity
> >
> >   dm-log-writes will pretend to do discard and the replay-log tool will
> >   replay it properly depending on the underlying device, either by writing
> >   0's or actually calling the discard ioctl, so I've enabled discard in the
> >   test for maximum fun.
> >
> > [Amir:]
> > - Removed unneeded _test_falloc_support dynamic FSX_OPTS
> > - Fold repetitions into for loops
> > - Added place holders for using constant random seeds
> > - Add pre umount checkpint
> > - Add test to new 'replay' group
> > - Address review comments by Eryu Guan
> >
> > Cc: Josef Bacik <jbacik@fb.com>
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> 
> 
> Josef,
> 
> As you know, this test is now merged to xfstest as generic/455.
> I have been running the test for a while on xfs and it occasionally
> reports inconsistencies which I try to investigate.
> 
> In some of the reports, it appears that dm-log-writes may be exhibiting
> a reliability issue (see below).
> 

It's not a reliability issue, its a caching issue.  dm-log-writes is just
issuing bio's to the log device, and our destructor waits for all pending io
blocks to complete before exiting, so unless I've missed how dm is destroying
devices everything should be on disk.

However since we replay in userspace we are going through the blockdevice's
pagecache, so we could have stale pages left in place which is screwing us up.
Will you try this patch and see if it fixes the problem?  Thanks,

Josef

Comments

Amir Goldstein Nov. 28, 2017, 4:48 p.m. UTC | #1
On Mon, Nov 27, 2017 at 5:04 PM, Josef Bacik <josef@toxicpanda.com> wrote:
> On Mon, Nov 27, 2017 at 11:56:58AM +0200, Amir Goldstein wrote:
>> On Tue, Sep 5, 2017 at 10:11 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> > Cherry-picked the test from commit 70d41e17164b
>> > in Josef Bacik's fstests tree (https://github.com/josefbacik/fstests).
>> > Quoting from Josef's commit message:
>> >
>> >   The test just runs some ops and exits, then finds all of the good buffers
>> >   in the directory we provided and:
>> >   - replays up to the mark given
>> >   - mounts the file system and compares the md5sum
>> >   - unmounts and fsck's to check for metadata integrity
>> >
>> >   dm-log-writes will pretend to do discard and the replay-log tool will
>> >   replay it properly depending on the underlying device, either by writing
>> >   0's or actually calling the discard ioctl, so I've enabled discard in the
>> >   test for maximum fun.
>> >
>> > [Amir:]
>> > - Removed unneeded _test_falloc_support dynamic FSX_OPTS
>> > - Fold repetitions into for loops
>> > - Added place holders for using constant random seeds
>> > - Add pre umount checkpint
>> > - Add test to new 'replay' group
>> > - Address review comments by Eryu Guan
>> >
>> > Cc: Josef Bacik <jbacik@fb.com>
>> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>
>>
>> Josef,
>>
>> As you know, this test is now merged to xfstest as generic/455.
>> I have been running the test for a while on xfs and it occasionally
>> reports inconsistencies which I try to investigate.
>>
>> In some of the reports, it appears that dm-log-writes may be exhibiting
>> a reliability issue (see below).
>>
>
> It's not a reliability issue, its a caching issue.  dm-log-writes is just
> issuing bio's to the log device, and our destructor waits for all pending io
> blocks to complete before exiting, so unless I've missed how dm is destroying
> devices everything should be on disk.
>
> However since we replay in userspace we are going through the blockdevice's
> pagecache, so we could have stale pages left in place which is screwing us up.
> Will you try this patch and see if it fixes the problem?  Thanks,
>
> Josef
>
>
> diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
> index 8b80a9ce9ea9..1c502930af5e 100644
> --- a/drivers/md/dm-log-writes.c
> +++ b/drivers/md/dm-log-writes.c
> @@ -545,6 +545,8 @@ static void log_writes_dtr(struct dm_target *ti)
>                    !atomic_read(&lc->pending_blocks));
>         kthread_stop(lc->log_kthread);
>
> +       invalidate_bdev(lc->logdev->bdev);
> +       invalidate_bdev(lc->dev->bdev);
>         WARN_ON(!list_empty(&lc->logging_blocks));
>         WARN_ON(!list_empty(&lc->unflushed_blocks));
>         dm_put_device(ti, lc->dev);

Josef,

With your patch OR with my xfstest patch that adds "sync" I did not yet see
another problem of garbage fs after _log_writes_remove.

I did however, encounter this error (failure to verify read data during fsx)
from scratch/log-writes device (see attached full log).

I will keep running the test to collect more information.

Thanks,
Amir.
Josef Bacik Nov. 28, 2017, 5:21 p.m. UTC | #2
On Tue, Nov 28, 2017 at 06:48:43PM +0200, Amir Goldstein wrote:
> On Mon, Nov 27, 2017 at 5:04 PM, Josef Bacik <josef@toxicpanda.com> wrote:
> > On Mon, Nov 27, 2017 at 11:56:58AM +0200, Amir Goldstein wrote:
> >> On Tue, Sep 5, 2017 at 10:11 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> >> > Cherry-picked the test from commit 70d41e17164b
> >> > in Josef Bacik's fstests tree (https://github.com/josefbacik/fstests).
> >> > Quoting from Josef's commit message:
> >> >
> >> >   The test just runs some ops and exits, then finds all of the good buffers
> >> >   in the directory we provided and:
> >> >   - replays up to the mark given
> >> >   - mounts the file system and compares the md5sum
> >> >   - unmounts and fsck's to check for metadata integrity
> >> >
> >> >   dm-log-writes will pretend to do discard and the replay-log tool will
> >> >   replay it properly depending on the underlying device, either by writing
> >> >   0's or actually calling the discard ioctl, so I've enabled discard in the
> >> >   test for maximum fun.
> >> >
> >> > [Amir:]
> >> > - Removed unneeded _test_falloc_support dynamic FSX_OPTS
> >> > - Fold repetitions into for loops
> >> > - Added place holders for using constant random seeds
> >> > - Add pre umount checkpint
> >> > - Add test to new 'replay' group
> >> > - Address review comments by Eryu Guan
> >> >
> >> > Cc: Josef Bacik <jbacik@fb.com>
> >> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >>
> >>
> >> Josef,
> >>
> >> As you know, this test is now merged to xfstest as generic/455.
> >> I have been running the test for a while on xfs and it occasionally
> >> reports inconsistencies which I try to investigate.
> >>
> >> In some of the reports, it appears that dm-log-writes may be exhibiting
> >> a reliability issue (see below).
> >>
> >
> > It's not a reliability issue, its a caching issue.  dm-log-writes is just
> > issuing bio's to the log device, and our destructor waits for all pending io
> > blocks to complete before exiting, so unless I've missed how dm is destroying
> > devices everything should be on disk.
> >
> > However since we replay in userspace we are going through the blockdevice's
> > pagecache, so we could have stale pages left in place which is screwing us up.
> > Will you try this patch and see if it fixes the problem?  Thanks,
> >
> > Josef
> >
> >
> > diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
> > index 8b80a9ce9ea9..1c502930af5e 100644
> > --- a/drivers/md/dm-log-writes.c
> > +++ b/drivers/md/dm-log-writes.c
> > @@ -545,6 +545,8 @@ static void log_writes_dtr(struct dm_target *ti)
> >                    !atomic_read(&lc->pending_blocks));
> >         kthread_stop(lc->log_kthread);
> >
> > +       invalidate_bdev(lc->logdev->bdev);
> > +       invalidate_bdev(lc->dev->bdev);
> >         WARN_ON(!list_empty(&lc->logging_blocks));
> >         WARN_ON(!list_empty(&lc->unflushed_blocks));
> >         dm_put_device(ti, lc->dev);
> 
> Josef,
> 
> With your patch OR with my xfstest patch that adds "sync" I did not yet see
> another problem of garbage fs after _log_writes_remove.
> 
> I did however, encounter this error (failure to verify read data during fsx)
> from scratch/log-writes device (see attached full log).
> 
> I will keep running the test to collect more information.
> 

That failure I'll lay at the feet of whatever fs you are testing ;).  I'm glad
my patch fixed the replay problem, I'll send that up.  Thanks,

Josef
Amir Goldstein Nov. 28, 2017, 7:32 p.m. UTC | #3
On Tue, Nov 28, 2017 at 7:21 PM, Josef Bacik <josef@toxicpanda.com> wrote:
> On Tue, Nov 28, 2017 at 06:48:43PM +0200, Amir Goldstein wrote:
>> On Mon, Nov 27, 2017 at 5:04 PM, Josef Bacik <josef@toxicpanda.com> wrote:
>> > On Mon, Nov 27, 2017 at 11:56:58AM +0200, Amir Goldstein wrote:
>> >> On Tue, Sep 5, 2017 at 10:11 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> >> > Cherry-picked the test from commit 70d41e17164b
>> >> > in Josef Bacik's fstests tree (https://github.com/josefbacik/fstests).
>> >> > Quoting from Josef's commit message:
>> >> >
>> >> >   The test just runs some ops and exits, then finds all of the good buffers
>> >> >   in the directory we provided and:
>> >> >   - replays up to the mark given
>> >> >   - mounts the file system and compares the md5sum
>> >> >   - unmounts and fsck's to check for metadata integrity
>> >> >
>> >> >   dm-log-writes will pretend to do discard and the replay-log tool will
>> >> >   replay it properly depending on the underlying device, either by writing
>> >> >   0's or actually calling the discard ioctl, so I've enabled discard in the
>> >> >   test for maximum fun.
>> >> >
>> >> > [Amir:]
>> >> > - Removed unneeded _test_falloc_support dynamic FSX_OPTS
>> >> > - Fold repetitions into for loops
>> >> > - Added place holders for using constant random seeds
>> >> > - Add pre umount checkpint
>> >> > - Add test to new 'replay' group
>> >> > - Address review comments by Eryu Guan
>> >> >
>> >> > Cc: Josef Bacik <jbacik@fb.com>
>> >> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> >>
>> >>
>> >> Josef,
>> >>
>> >> As you know, this test is now merged to xfstest as generic/455.
>> >> I have been running the test for a while on xfs and it occasionally
>> >> reports inconsistencies which I try to investigate.
>> >>
>> >> In some of the reports, it appears that dm-log-writes may be exhibiting
>> >> a reliability issue (see below).
>> >>
>> >
>> > It's not a reliability issue, its a caching issue.  dm-log-writes is just
>> > issuing bio's to the log device, and our destructor waits for all pending io
>> > blocks to complete before exiting, so unless I've missed how dm is destroying
>> > devices everything should be on disk.
>> >
>> > However since we replay in userspace we are going through the blockdevice's
>> > pagecache, so we could have stale pages left in place which is screwing us up.
>> > Will you try this patch and see if it fixes the problem?  Thanks,
>> >
>> > Josef
>> >
>> >
>> > diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
>> > index 8b80a9ce9ea9..1c502930af5e 100644
>> > --- a/drivers/md/dm-log-writes.c
>> > +++ b/drivers/md/dm-log-writes.c
>> > @@ -545,6 +545,8 @@ static void log_writes_dtr(struct dm_target *ti)
>> >                    !atomic_read(&lc->pending_blocks));
>> >         kthread_stop(lc->log_kthread);
>> >
>> > +       invalidate_bdev(lc->logdev->bdev);
>> > +       invalidate_bdev(lc->dev->bdev);
>> >         WARN_ON(!list_empty(&lc->logging_blocks));
>> >         WARN_ON(!list_empty(&lc->unflushed_blocks));
>> >         dm_put_device(ti, lc->dev);
>>
>> Josef,
>>
>> With your patch OR with my xfstest patch that adds "sync" I did not yet see
>> another problem of garbage fs after _log_writes_remove.
>>
>> I did however, encounter this error (failure to verify read data during fsx)
>> from scratch/log-writes device (see attached full log).
>>
>> I will keep running the test to collect more information.
>>
>
> That failure I'll lay at the feet of whatever fs you are testing ;).  I'm glad
> my patch fixed the replay problem, I'll send that up.  Thanks,
>

O oh!. You are implying that xfs fails plain fsx and nobody noticed.
That is not where I would place my bet.

Let's wait and see what other testers find and what other incidents
my test machine will find.

Cheers,
Amir.
Josef Bacik Nov. 28, 2017, 8 p.m. UTC | #4
On Tue, Nov 28, 2017 at 09:32:59PM +0200, Amir Goldstein wrote:
> On Tue, Nov 28, 2017 at 7:21 PM, Josef Bacik <josef@toxicpanda.com> wrote:
> > On Tue, Nov 28, 2017 at 06:48:43PM +0200, Amir Goldstein wrote:
> >> On Mon, Nov 27, 2017 at 5:04 PM, Josef Bacik <josef@toxicpanda.com> wrote:
> >> > On Mon, Nov 27, 2017 at 11:56:58AM +0200, Amir Goldstein wrote:
> >> >> On Tue, Sep 5, 2017 at 10:11 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> >> >> > Cherry-picked the test from commit 70d41e17164b
> >> >> > in Josef Bacik's fstests tree (https://github.com/josefbacik/fstests).
> >> >> > Quoting from Josef's commit message:
> >> >> >
> >> >> >   The test just runs some ops and exits, then finds all of the good buffers
> >> >> >   in the directory we provided and:
> >> >> >   - replays up to the mark given
> >> >> >   - mounts the file system and compares the md5sum
> >> >> >   - unmounts and fsck's to check for metadata integrity
> >> >> >
> >> >> >   dm-log-writes will pretend to do discard and the replay-log tool will
> >> >> >   replay it properly depending on the underlying device, either by writing
> >> >> >   0's or actually calling the discard ioctl, so I've enabled discard in the
> >> >> >   test for maximum fun.
> >> >> >
> >> >> > [Amir:]
> >> >> > - Removed unneeded _test_falloc_support dynamic FSX_OPTS
> >> >> > - Fold repetitions into for loops
> >> >> > - Added place holders for using constant random seeds
> >> >> > - Add pre umount checkpint
> >> >> > - Add test to new 'replay' group
> >> >> > - Address review comments by Eryu Guan
> >> >> >
> >> >> > Cc: Josef Bacik <jbacik@fb.com>
> >> >> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >> >>
> >> >>
> >> >> Josef,
> >> >>
> >> >> As you know, this test is now merged to xfstest as generic/455.
> >> >> I have been running the test for a while on xfs and it occasionally
> >> >> reports inconsistencies which I try to investigate.
> >> >>
> >> >> In some of the reports, it appears that dm-log-writes may be exhibiting
> >> >> a reliability issue (see below).
> >> >>
> >> >
> >> > It's not a reliability issue, its a caching issue.  dm-log-writes is just
> >> > issuing bio's to the log device, and our destructor waits for all pending io
> >> > blocks to complete before exiting, so unless I've missed how dm is destroying
> >> > devices everything should be on disk.
> >> >
> >> > However since we replay in userspace we are going through the blockdevice's
> >> > pagecache, so we could have stale pages left in place which is screwing us up.
> >> > Will you try this patch and see if it fixes the problem?  Thanks,
> >> >
> >> > Josef
> >> >
> >> >
> >> > diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
> >> > index 8b80a9ce9ea9..1c502930af5e 100644
> >> > --- a/drivers/md/dm-log-writes.c
> >> > +++ b/drivers/md/dm-log-writes.c
> >> > @@ -545,6 +545,8 @@ static void log_writes_dtr(struct dm_target *ti)
> >> >                    !atomic_read(&lc->pending_blocks));
> >> >         kthread_stop(lc->log_kthread);
> >> >
> >> > +       invalidate_bdev(lc->logdev->bdev);
> >> > +       invalidate_bdev(lc->dev->bdev);
> >> >         WARN_ON(!list_empty(&lc->logging_blocks));
> >> >         WARN_ON(!list_empty(&lc->unflushed_blocks));
> >> >         dm_put_device(ti, lc->dev);
> >>
> >> Josef,
> >>
> >> With your patch OR with my xfstest patch that adds "sync" I did not yet see
> >> another problem of garbage fs after _log_writes_remove.
> >>
> >> I did however, encounter this error (failure to verify read data during fsx)
> >> from scratch/log-writes device (see attached full log).
> >>
> >> I will keep running the test to collect more information.
> >>
> >
> > That failure I'll lay at the feet of whatever fs you are testing ;).  I'm glad
> > my patch fixed the replay problem, I'll send that up.  Thanks,
> >
> 
> O oh!. You are implying that xfs fails plain fsx and nobody noticed.
> That is not where I would place my bet.
> 

Why not?  This is one of the only tests that runs multiple threads of fsx at the
same time.  dm-log-writes does nothing special with the actual target disk, it
just clones the bio's as it gets them, sets the bdev for the normal device and
sends them on their way, there's no reason to think we're breaking that.
Furthermore fsx is in normal page cache mode, not O_DIRECT, so it's more likely
something else is going wrong and not dm-log-writes, unless there is so little
memory on the system that we are constantly reading cold-cache.  Thanks,

Josef
Amir Goldstein Nov. 28, 2017, 8:26 p.m. UTC | #5
On Tue, Nov 28, 2017 at 10:00 PM, Josef Bacik <josef@toxicpanda.com> wrote:
> On Tue, Nov 28, 2017 at 09:32:59PM +0200, Amir Goldstein wrote:
>> On Tue, Nov 28, 2017 at 7:21 PM, Josef Bacik <josef@toxicpanda.com> wrote:
>> > On Tue, Nov 28, 2017 at 06:48:43PM +0200, Amir Goldstein wrote:
>> >> On Mon, Nov 27, 2017 at 5:04 PM, Josef Bacik <josef@toxicpanda.com> wrote:
>> >> > On Mon, Nov 27, 2017 at 11:56:58AM +0200, Amir Goldstein wrote:
>> >> >> On Tue, Sep 5, 2017 at 10:11 PM, Amir Goldstein <amir73il@gmail.com> wrote:
>> >> >> > Cherry-picked the test from commit 70d41e17164b
>> >> >> > in Josef Bacik's fstests tree (https://github.com/josefbacik/fstests).
>> >> >> > Quoting from Josef's commit message:
>> >> >> >
>> >> >> >   The test just runs some ops and exits, then finds all of the good buffers
>> >> >> >   in the directory we provided and:
>> >> >> >   - replays up to the mark given
>> >> >> >   - mounts the file system and compares the md5sum
>> >> >> >   - unmounts and fsck's to check for metadata integrity
>> >> >> >
>> >> >> >   dm-log-writes will pretend to do discard and the replay-log tool will
>> >> >> >   replay it properly depending on the underlying device, either by writing
>> >> >> >   0's or actually calling the discard ioctl, so I've enabled discard in the
>> >> >> >   test for maximum fun.
>> >> >> >
>> >> >> > [Amir:]
>> >> >> > - Removed unneeded _test_falloc_support dynamic FSX_OPTS
>> >> >> > - Fold repetitions into for loops
>> >> >> > - Added place holders for using constant random seeds
>> >> >> > - Add pre umount checkpint
>> >> >> > - Add test to new 'replay' group
>> >> >> > - Address review comments by Eryu Guan
>> >> >> >
>> >> >> > Cc: Josef Bacik <jbacik@fb.com>
>> >> >> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> >> >>
>> >> >>
>> >> >> Josef,
>> >> >>
>> >> >> As you know, this test is now merged to xfstest as generic/455.
>> >> >> I have been running the test for a while on xfs and it occasionally
>> >> >> reports inconsistencies which I try to investigate.
>> >> >>
>> >> >> In some of the reports, it appears that dm-log-writes may be exhibiting
>> >> >> a reliability issue (see below).
>> >> >>
>> >> >
>> >> > It's not a reliability issue, its a caching issue.  dm-log-writes is just
>> >> > issuing bio's to the log device, and our destructor waits for all pending io
>> >> > blocks to complete before exiting, so unless I've missed how dm is destroying
>> >> > devices everything should be on disk.
>> >> >
>> >> > However since we replay in userspace we are going through the blockdevice's
>> >> > pagecache, so we could have stale pages left in place which is screwing us up.
>> >> > Will you try this patch and see if it fixes the problem?  Thanks,
>> >> >
>> >> > Josef
>> >> >
>> >> >
>> >> > diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
>> >> > index 8b80a9ce9ea9..1c502930af5e 100644
>> >> > --- a/drivers/md/dm-log-writes.c
>> >> > +++ b/drivers/md/dm-log-writes.c
>> >> > @@ -545,6 +545,8 @@ static void log_writes_dtr(struct dm_target *ti)
>> >> >                    !atomic_read(&lc->pending_blocks));
>> >> >         kthread_stop(lc->log_kthread);
>> >> >
>> >> > +       invalidate_bdev(lc->logdev->bdev);
>> >> > +       invalidate_bdev(lc->dev->bdev);
>> >> >         WARN_ON(!list_empty(&lc->logging_blocks));
>> >> >         WARN_ON(!list_empty(&lc->unflushed_blocks));
>> >> >         dm_put_device(ti, lc->dev);
>> >>
>> >> Josef,
>> >>
>> >> With your patch OR with my xfstest patch that adds "sync" I did not yet see
>> >> another problem of garbage fs after _log_writes_remove.
>> >>
>> >> I did however, encounter this error (failure to verify read data during fsx)
>> >> from scratch/log-writes device (see attached full log).
>> >>
>> >> I will keep running the test to collect more information.
>> >>
>> >
>> > That failure I'll lay at the feet of whatever fs you are testing ;).  I'm glad
>> > my patch fixed the replay problem, I'll send that up.  Thanks,
>> >
>>
>> O oh!. You are implying that xfs fails plain fsx and nobody noticed.
>> That is not where I would place my bet.
>>
>
> Why not?  This is one of the only tests that runs multiple threads of fsx at the
> same time.  dm-log-writes does nothing special with the actual target disk, it
> just clones the bio's as it gets them, sets the bdev for the normal device and
> sends them on their way, there's no reason to think we're breaking that.
> Furthermore fsx is in normal page cache mode, not O_DIRECT, so it's more likely
> something else is going wrong and not dm-log-writes, unless there is so little
> memory on the system that we are constantly reading cold-cache.  Thanks,
>

Very well, I'll run the same test on bare xfs for comparison.

Amir.
Darrick J. Wong Nov. 28, 2017, 10:33 p.m. UTC | #6
On Tue, Nov 28, 2017 at 11:17:01PM +0200, Amir Goldstein wrote:
> On Tue, Nov 28, 2017 at 10:26 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> > On Tue, Nov 28, 2017 at 10:00 PM, Josef Bacik <josef@toxicpanda.com> wrote:
> >> On Tue, Nov 28, 2017 at 09:32:59PM +0200, Amir Goldstein wrote:
> >>> On Tue, Nov 28, 2017 at 7:21 PM, Josef Bacik <josef@toxicpanda.com> wrote:
> >>> > On Tue, Nov 28, 2017 at 06:48:43PM +0200, Amir Goldstein wrote:
> >>> >> On Mon, Nov 27, 2017 at 5:04 PM, Josef Bacik <josef@toxicpanda.com> wrote:
> >>> >> > On Mon, Nov 27, 2017 at 11:56:58AM +0200, Amir Goldstein wrote:
> >>> >> >> On Tue, Sep 5, 2017 at 10:11 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> >>> >> >> > Cherry-picked the test from commit 70d41e17164b
> >>> >> >> > in Josef Bacik's fstests tree (https://github.com/josefbacik/fstests).
> >>> >> >> > Quoting from Josef's commit message:
> >>> >> >> >
> >>> >> >> >   The test just runs some ops and exits, then finds all of the good buffers
> >>> >> >> >   in the directory we provided and:
> >>> >> >> >   - replays up to the mark given
> >>> >> >> >   - mounts the file system and compares the md5sum
> >>> >> >> >   - unmounts and fsck's to check for metadata integrity
> >>> >> >> >
> >>> >> >> >   dm-log-writes will pretend to do discard and the replay-log tool will
> >>> >> >> >   replay it properly depending on the underlying device, either by writing
> >>> >> >> >   0's or actually calling the discard ioctl, so I've enabled discard in the
> >>> >> >> >   test for maximum fun.
> >>> >> >> >
> >>> >> >> > [Amir:]
> >>> >> >> > - Removed unneeded _test_falloc_support dynamic FSX_OPTS
> >>> >> >> > - Fold repetitions into for loops
> >>> >> >> > - Added place holders for using constant random seeds
> >>> >> >> > - Add pre umount checkpint
> >>> >> >> > - Add test to new 'replay' group
> >>> >> >> > - Address review comments by Eryu Guan
> >>> >> >> >
> >>> >> >> > Cc: Josef Bacik <jbacik@fb.com>
> >>> >> >> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >>> >> >>
> >>> >> >>
> >>> >> >> Josef,
> >>> >> >>
> >>> >> >> As you know, this test is now merged to xfstest as generic/455.
> >>> >> >> I have been running the test for a while on xfs and it occasionally
> >>> >> >> reports inconsistencies which I try to investigate.
> >>> >> >>
> >>> >> >> In some of the reports, it appears that dm-log-writes may be exhibiting
> >>> >> >> a reliability issue (see below).
> >>> >> >>
> >>> >> >
> >>> >> > It's not a reliability issue, its a caching issue.  dm-log-writes is just
> >>> >> > issuing bio's to the log device, and our destructor waits for all pending io
> >>> >> > blocks to complete before exiting, so unless I've missed how dm is destroying
> >>> >> > devices everything should be on disk.
> >>> >> >
> >>> >> > However since we replay in userspace we are going through the blockdevice's
> >>> >> > pagecache, so we could have stale pages left in place which is screwing us up.
> >>> >> > Will you try this patch and see if it fixes the problem?  Thanks,
> >>> >> >
> >>> >> > Josef
> >>> >> >
> >>> >> >
> >>> >> > diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
> >>> >> > index 8b80a9ce9ea9..1c502930af5e 100644
> >>> >> > --- a/drivers/md/dm-log-writes.c
> >>> >> > +++ b/drivers/md/dm-log-writes.c
> >>> >> > @@ -545,6 +545,8 @@ static void log_writes_dtr(struct dm_target *ti)
> >>> >> >                    !atomic_read(&lc->pending_blocks));
> >>> >> >         kthread_stop(lc->log_kthread);
> >>> >> >
> >>> >> > +       invalidate_bdev(lc->logdev->bdev);
> >>> >> > +       invalidate_bdev(lc->dev->bdev);
> >>> >> >         WARN_ON(!list_empty(&lc->logging_blocks));
> >>> >> >         WARN_ON(!list_empty(&lc->unflushed_blocks));
> >>> >> >         dm_put_device(ti, lc->dev);
> >>> >>
> >>> >> Josef,
> >>> >>
> >>> >> With your patch OR with my xfstest patch that adds "sync" I did not yet see
> >>> >> another problem of garbage fs after _log_writes_remove.
> >>> >>
> >>> >> I did however, encounter this error (failure to verify read data during fsx)
> >>> >> from scratch/log-writes device (see attached full log).
> >>> >>
> >>> >> I will keep running the test to collect more information.
> >>> >>
> >>> >
> >>> > That failure I'll lay at the feet of whatever fs you are testing ;).  I'm glad
> >>> > my patch fixed the replay problem, I'll send that up.  Thanks,
> >>> >
> >>>
> >>> O oh!. You are implying that xfs fails plain fsx and nobody noticed.
> >>> That is not where I would place my bet.
> >>>
> >>
> >> Why not?  This is one of the only tests that runs multiple threads of fsx at the
> >> same time.  dm-log-writes does nothing special with the actual target disk, it
> >> just clones the bio's as it gets them, sets the bdev for the normal device and
> >> sends them on their way, there's no reason to think we're breaking that.
> >> Furthermore fsx is in normal page cache mode, not O_DIRECT, so it's more likely
> >> something else is going wrong and not dm-log-writes, unless there is so little
> >> memory on the system that we are constantly reading cold-cache.  Thanks,
> >>
> >
> > Very well, I'll run the same test on bare xfs for comparison.
> >
> 
> Well, what do you know.
> xfs crashed and burned after 18 rounds of 455 on test partition.
> Same check fs error I blamed dm-log-writes for.
> 
> Darrick,
> 
> Attached the modified 455 test which runs fsx on test partition
> and full log (+ dmesg with nothing of interest AFAICT).
> xfs code is 4.14.0-rc8 + your patch
> "xfs: log recovery should replay deferred ops in order"
> 
> The test was run on a 100G partition on a spinning disk.
> 
> Let me know what you think.

I didn't even get /that/ far; with 4.15-rc1 and a vanilla g/455 I see
immediately:

[18395.236285] run fstests generic/457 at 2017-11-28 14:15:49
[18395.561112] XFS (sdf): Unmounting Filesystem
[18395.752713] XFS (sdf): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
[18395.753469] XFS (sdf): EXPERIMENTAL reflink feature enabled. Use at your own risk!
[18395.754259] XFS (sdf): Mounting V5 Filesystem
[18395.757525] XFS (sdf): Ending clean mount
[18395.796992] XFS (sdf): Unmounting Filesystem
[18395.949561] XFS (dm-0): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
[18395.950316] XFS (dm-0): EXPERIMENTAL reflink feature enabled. Use at your own risk!
[18395.951103] XFS (dm-0): Mounting V5 Filesystem
[18395.954946] XFS (dm-0): Ending clean mount
[18396.132701] XFS (dm-0): Unmounting Filesystem
[18396.309939] XFS (sdf): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
[18396.310913] XFS (sdf): EXPERIMENTAL reflink feature enabled. Use at your own risk!
[18396.311798] XFS (sdf): Mounting V5 Filesystem
[18396.330304] XFS (sdf): Starting recovery (logdev: internal)
[18396.334996] XFS (sdf): Metadata corruption detected at xfs_inode_buf_verify+0xc4/0x330 [xfs], xfs_inode block 0x80
[18396.336389] XFS (sdf): Unmount and run xfs_repair
[18396.336945] XFS (sdf): First 64 bytes of corrupted metadata buffer:
[18396.337679] ffffc90004080000: 00 4e 41 ed 03 01 00 00 00 00 00 00 00 00 00 00  .NA.............
[18396.338647] ffffc90004080010: 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00  ................
[18396.339576] ffffc90004080020: 00 00 00 00 00 00 00 00 5a 1d e0 16 0b 69 8a 10  ........Z....i..
[18396.340341] ffffc90004080030: 5a 1d e0 16 0b 69 8a 10 00 00 00 00 00 00 00 06  Z....i..........
[18396.341091] XFS (sdf): bad inode magic/vsn daddr 128 #0 (magic=4e)
[18396.341754] XFS (sdf): metadata I/O error: block 0x80 ("xlog_recover_do..(read#2)") error 117 numblks 32
[18396.343131] XFS (sdf): log mount/recovery failed: error -117
[18396.343852] XFS (sdf): log mount failed

Off by one in the inode verifier... am I supposed to have Josef's patch?

Also, fwiw I don't see any test failures (or interesting output) with
your modified 455:

[19344.527290] run fstests generic/455 at 2017-11-28 14:31:38
[19344.948061] XFS (pmem4): Unmounting Filesystem
[19345.098299] XFS (dm-0): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
[19345.099328] XFS (dm-0): EXPERIMENTAL reflink feature enabled. Use at your own risk!
[19345.100427] XFS (dm-0): Mounting V5 Filesystem
[19345.105724] XFS (dm-0): Ending clean mount
[19345.476248] XFS (dm-0): Unmounting Filesystem
[19346.055840] XFS (pmem4): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
[19346.057092] XFS (pmem4): EXPERIMENTAL reflink feature enabled. Use at your own risk!
[19346.057883] XFS (pmem4): Mounting V5 Filesystem
[19346.059640] XFS (pmem4): Ending clean mount
[19346.084019] XFS (pmem4): Unmounting Filesystem
[19346.492297] XFS (pmem4): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
[19346.492950] XFS (pmem4): EXPERIMENTAL reflink feature enabled. Use at your own risk!
[19346.493865] XFS (pmem4): Mounting V5 Filesystem
[19346.495732] XFS (pmem4): Ending clean mount
[19346.556181] XFS (pmem4): Unmounting Filesystem
[19346.780951] XFS (pmem3): Unmounting Filesystem
[19346.857051] XFS (pmem3): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
[19346.858352] XFS (pmem3): EXPERIMENTAL reflink feature enabled. Use at your own risk!
[19346.859053] XFS (pmem3): Mounting V5 Filesystem
[19346.861319] XFS (pmem3): Ending clean mount
[19346.940162] XFS (pmem3): Unmounting Filesystem
[19346.980932] XFS (pmem3): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
[19346.981640] XFS (pmem3): EXPERIMENTAL reflink feature enabled. Use at your own risk!
[19346.982359] XFS (pmem3): Mounting V5 Filesystem
[19346.984765] XFS (pmem3): Ending clean mount
[19347.204449] XFS (pmem3): Unmounting Filesystem

I think it's supposed to cycle more than that, right?

--D

> 
> Amir.
Amir Goldstein Nov. 29, 2017, 3:33 a.m. UTC | #7
On Wed, Nov 29, 2017 at 12:33 AM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Tue, Nov 28, 2017 at 11:17:01PM +0200, Amir Goldstein wrote:
[...]
>> Well, what do you know.
>> xfs crashed and burned after 18 rounds of 455 on test partition.
>> Same check fs error I blamed dm-log-writes for.
>>
>> Darrick,
>>
>> Attached the modified 455 test which runs fsx on test partition
>> and full log (+ dmesg with nothing of interest AFAICT).
>> xfs code is 4.14.0-rc8 + your patch
>> "xfs: log recovery should replay deferred ops in order"
>>
>> The test was run on a 100G partition on a spinning disk.
>>
>> Let me know what you think.
>
> I didn't even get /that/ far; with 4.15-rc1 and a vanilla g/455 I see
> immediately:

You mean vanilla g/457?

>
> [18395.236285] run fstests generic/457 at 2017-11-28 14:15:49
> [18395.561112] XFS (sdf): Unmounting Filesystem
> [18395.752713] XFS (sdf): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
> [18395.753469] XFS (sdf): EXPERIMENTAL reflink feature enabled. Use at your own risk!
> [18395.754259] XFS (sdf): Mounting V5 Filesystem
> [18395.757525] XFS (sdf): Ending clean mount
> [18395.796992] XFS (sdf): Unmounting Filesystem
> [18395.949561] XFS (dm-0): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
> [18395.950316] XFS (dm-0): EXPERIMENTAL reflink feature enabled. Use at your own risk!
> [18395.951103] XFS (dm-0): Mounting V5 Filesystem
> [18395.954946] XFS (dm-0): Ending clean mount
> [18396.132701] XFS (dm-0): Unmounting Filesystem
> [18396.309939] XFS (sdf): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
> [18396.310913] XFS (sdf): EXPERIMENTAL reflink feature enabled. Use at your own risk!
> [18396.311798] XFS (sdf): Mounting V5 Filesystem
> [18396.330304] XFS (sdf): Starting recovery (logdev: internal)
> [18396.334996] XFS (sdf): Metadata corruption detected at xfs_inode_buf_verify+0xc4/0x330 [xfs], xfs_inode block 0x80
> [18396.336389] XFS (sdf): Unmount and run xfs_repair
> [18396.336945] XFS (sdf): First 64 bytes of corrupted metadata buffer:
> [18396.337679] ffffc90004080000: 00 4e 41 ed 03 01 00 00 00 00 00 00 00 00 00 00  .NA.............
> [18396.338647] ffffc90004080010: 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00  ................
> [18396.339576] ffffc90004080020: 00 00 00 00 00 00 00 00 5a 1d e0 16 0b 69 8a 10  ........Z....i..
> [18396.340341] ffffc90004080030: 5a 1d e0 16 0b 69 8a 10 00 00 00 00 00 00 00 06  Z....i..........
> [18396.341091] XFS (sdf): bad inode magic/vsn daddr 128 #0 (magic=4e)
> [18396.341754] XFS (sdf): metadata I/O error: block 0x80 ("xlog_recover_do..(read#2)") error 117 numblks 32
> [18396.343131] XFS (sdf): log mount/recovery failed: error -117
> [18396.343852] XFS (sdf): log mount failed
>
> Off by one in the inode verifier... am I supposed to have Josef's patch?
>

I donno. Josef's patch makes sense, but it only matters for cleaning
logdev page cache
from stale data in between 2 different runs

> Also, fwiw I don't see any test failures (or interesting output) with
> your modified 455:
>
> [19344.527290] run fstests generic/455 at 2017-11-28 14:31:38
> [19344.948061] XFS (pmem4): Unmounting Filesystem
> [19345.098299] XFS (dm-0): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
> [19345.099328] XFS (dm-0): EXPERIMENTAL reflink feature enabled. Use at your own risk!
> [19345.100427] XFS (dm-0): Mounting V5 Filesystem
> [19345.105724] XFS (dm-0): Ending clean mount
> [19345.476248] XFS (dm-0): Unmounting Filesystem
> [19346.055840] XFS (pmem4): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
> [19346.057092] XFS (pmem4): EXPERIMENTAL reflink feature enabled. Use at your own risk!
> [19346.057883] XFS (pmem4): Mounting V5 Filesystem
> [19346.059640] XFS (pmem4): Ending clean mount
> [19346.084019] XFS (pmem4): Unmounting Filesystem
> [19346.492297] XFS (pmem4): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
> [19346.492950] XFS (pmem4): EXPERIMENTAL reflink feature enabled. Use at your own risk!
> [19346.493865] XFS (pmem4): Mounting V5 Filesystem
> [19346.495732] XFS (pmem4): Ending clean mount
> [19346.556181] XFS (pmem4): Unmounting Filesystem
> [19346.780951] XFS (pmem3): Unmounting Filesystem
> [19346.857051] XFS (pmem3): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
> [19346.858352] XFS (pmem3): EXPERIMENTAL reflink feature enabled. Use at your own risk!
> [19346.859053] XFS (pmem3): Mounting V5 Filesystem
> [19346.861319] XFS (pmem3): Ending clean mount
> [19346.940162] XFS (pmem3): Unmounting Filesystem
> [19346.980932] XFS (pmem3): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
> [19346.981640] XFS (pmem3): EXPERIMENTAL reflink feature enabled. Use at your own risk!
> [19346.982359] XFS (pmem3): Mounting V5 Filesystem
> [19346.984765] XFS (pmem3): Ending clean mount
> [19347.204449] XFS (pmem3): Unmounting Filesystem
>
> I think it's supposed to cycle more than that, right?
>

Usually yes. This time around I got it after 18 rounds, but sometimes
it takes more than 100 rounds.
Modified 455 is much faster because it skips all the replays, so I can
expedite the testing.
Also, since most of the failures I have seen were on the slow spinning
disk, not sure you would get
them on pmem device.

I'll continue to bisect.

Amir.
Darrick J. Wong Dec. 4, 2017, 8:53 p.m. UTC | #8
On Mon, Dec 04, 2017 at 10:17:30PM +0200, Amir Goldstein wrote:
> On Thu, Nov 30, 2017 at 10:28 PM, Amir Goldstein <amir73il@gmail.com> wrote:
> > On Wed, Nov 29, 2017 at 5:33 AM, Amir Goldstein <amir73il@gmail.com> wrote:
> [...]
> > So far I was able to determine that your patch
> > "xfs: log recovery should replay deferred ops in order" is NOT the
> > cause of the problem.
> > This took some time, because at one point it took me 23 hr to get to
> > the dirty log
> > in test partition with modified 455 (no dm-log-writes).
> >
> > Attached metadump of corrupt test partition.
> > The xfs code this test was running with is v4.14-rc8.
> > I did not try to bisect any further because of the time it takes per commit.
> >
> > Let me know if you need any other info or if you want me to run the test
> > on my setup for specific patch and/or bisection points.
> >
> 
> I figured out what was going on in my test setup.
> The answer was in the attached dmesg, but I overlooked it:
> 
> [33816.533286] ata3.00: failed command: FLUSH CACHE EXT
> [33816.533294] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 21
>                         res 40/00:00:20:44:ba/00:00:0c:00:00/40 Emask
> 0x10 (ATA bus error)
> [33816.533300] ata3.00: status: { DRDY }
> [33816.533309] ata3: hard resetting link
> 
> It appears that that test machine had a faulty SATA cable.
> 
> This is probably more cruel to fs than a dm-flakey/dm-log-writes test...

Not as bad as the time when I discovered that one of my UASP bridges was
arbitrarily injecting 'USBUSBUSB' into bus transfers.

> Cable replaced. Back to sanity. Sorry for the noise.

:)

--D

> Amir.
diff mbox

Patch

diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index 8b80a9ce9ea9..1c502930af5e 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -545,6 +545,8 @@  static void log_writes_dtr(struct dm_target *ti)
 		   !atomic_read(&lc->pending_blocks));
 	kthread_stop(lc->log_kthread);
 
+	invalidate_bdev(lc->logdev->bdev);
+	invalidate_bdev(lc->dev->bdev);
 	WARN_ON(!list_empty(&lc->logging_blocks));
 	WARN_ON(!list_empty(&lc->unflushed_blocks));
 	dm_put_device(ti, lc->dev);