Message ID | 20231117144317.10882-1-bfoster@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | generic/459: improve shutdown/read-only check to accommodate bcachefs | expand |
On Fri, Nov 17, 2023 at 09:43:17AM -0500, Brian Foster wrote: > generic/459 occasionally fails on bcachefs because the deliberately > induced I/O errors caused by exhausting the overprovisioned thin > pool can lead to filesystem shutdown. This test considers this > expected behavior on certain fs', but only checks for the ext4 > remount read-only behavior. bcachefs does a similar emergency > read-only transition in response to certain I/O errors, but it > behaves more similar to an XFS shutdown and doesn't necessarily > reflect "ro" state in the mount table (unless induced by userspace). > > Since the test already runs a touch command to help trigger the ext4 > error handling sequence, this can be tweaked to serve double duty > and also more accurately detect read-only status on bcachefs. > Refactor into a small helper, check for an EROFS return to the touch > command, and consider the fs read-only if either that or the mount > entry check indicates it. > > Signed-off-by: Brian Foster <bfoster@redhat.com> > --- > > Something I realized when writing up the commit log is that the EROFS > check doesn't technically cover XFS, which IIRC returns EIO in response > to any sorts of writes once the fs has shutdown. I'm not sure this > matters currently because XFS doesn't shutdown due to the default > behavior to retry failed I/Os, but technically if XFS were configured to > not retry I/O errors and go right to permanent failure, I suspect it > would fail this test in the same way bcachefs does. > > That could be addressed fairly easily by also checking for EIO error > message output, or just assuming touch failure == shutdown, etc. I don't > have much preference on that, so thoughts appreciated. I wish there was a better way to signal that a filesystem has shut down, though ATM that isn't even a VFS level concept. I generally assume that touch failure == shutdown if the fs was previously writable. OTOH with statmount landing soonish, perhaps we ought to apply for a new SB_SHUTDOWN state flag for it to export? --D > Brian > > tests/generic/459 | 30 +++++++++++++++++++++++------- > 1 file changed, 23 insertions(+), 7 deletions(-) > > diff --git a/tests/generic/459 b/tests/generic/459 > index 4dd7a43b..d0c48325 100755 > --- a/tests/generic/459 > +++ b/tests/generic/459 > @@ -57,6 +57,26 @@ origpsize=200 > virtsize=300 > newpsize=300 > > +# Check whether the filesystem has shutdown or remounted read-only. Behavior can > +# differ based on filesystem and configuration. Some fs' may not have remounted > +# without an additional write while others may have shutdown but do not > +# necessarily reflect read-only state in the mount options. Check both here to > +# cover the various scenarios. > +is_shutdown_or_ro() > +{ > + ro=0 > + > + # if the fs has not shutdown, this may help trigger a remount-ro > + touch $SCRATCH_MNT/newfile 2>&1 | \ > + grep "Read-only file system" > /dev/null > + [ $? == 0 ] && ro=1 > + > + _fs_options /dev/mapper/$vgname-$snapname | grep -w "ro" > /dev/null > + [ $? == 0 ] && ro=1 > + > + echo $ro > +} > + > # Ensure we have enough disk space > _scratch_mkfs_sized $((350 * 1024 * 1024)) >>$seqres.full 2>&1 > > @@ -113,13 +133,9 @@ ret=$? > # - The filesystem stays in Read-Write mode, but can be frozen/thawed > # without getting stuck. > if [ $ret -ne 0 ]; then > - # freeze failed, filesystem should reject further writes and remount > - # as readonly. Sometimes the previous write process won't trigger > - # ro-remount, e.g. on ext3/4, do additional touch here to make sure > - # filesystems see the metadata I/O error. > - touch $SCRATCH_MNT/newfile >/dev/null 2>&1 > - ISRO=$(_fs_options /dev/mapper/$vgname-$snapname | grep -w "ro") > - if [ -n "$ISRO" ]; then > + # freeze failed, filesystem should reject further writes > + ISRO=`is_shutdown_or_ro` > + if [ $ISRO == 1 ]; then > echo "Test OK" > else > echo "Freeze failed and FS isn't Read-Only. Test Failed" > -- > 2.41.0 > >
On Fri, Nov 17, 2023 at 02:14:34PM -0800, Darrick J. Wong wrote: > On Fri, Nov 17, 2023 at 09:43:17AM -0500, Brian Foster wrote: > > generic/459 occasionally fails on bcachefs because the deliberately > > induced I/O errors caused by exhausting the overprovisioned thin > > pool can lead to filesystem shutdown. This test considers this > > expected behavior on certain fs', but only checks for the ext4 > > remount read-only behavior. bcachefs does a similar emergency > > read-only transition in response to certain I/O errors, but it > > behaves more similar to an XFS shutdown and doesn't necessarily > > reflect "ro" state in the mount table (unless induced by userspace). > > > > Since the test already runs a touch command to help trigger the ext4 > > error handling sequence, this can be tweaked to serve double duty > > and also more accurately detect read-only status on bcachefs. > > Refactor into a small helper, check for an EROFS return to the touch > > command, and consider the fs read-only if either that or the mount > > entry check indicates it. > > > > Signed-off-by: Brian Foster <bfoster@redhat.com> > > --- > > > > Something I realized when writing up the commit log is that the EROFS > > check doesn't technically cover XFS, which IIRC returns EIO in response > > to any sorts of writes once the fs has shutdown. I'm not sure this > > matters currently because XFS doesn't shutdown due to the default > > behavior to retry failed I/Os, but technically if XFS were configured to > > not retry I/O errors and go right to permanent failure, I suspect it > > would fail this test in the same way bcachefs does. > > > > That could be addressed fairly easily by also checking for EIO error > > message output, or just assuming touch failure == shutdown, etc. I don't > > have much preference on that, so thoughts appreciated. > > I wish there was a better way to signal that a filesystem has shut down, > though ATM that isn't even a VFS level concept. I generally assume that > touch failure == shutdown if the fs was previously writable. > Yeah, mildly annoying there was no good way to detect this. I think all we really have atm is dmesg scraping to call out unexpected shutdowns. That still needs to be added for bcachefs btw, but I've been holding off because it leads to noise on various dm-flakey oriented tests and whatnot that complain about shutdowns that otherwise seem to be expected from bcachefs. Though perhaps the right thing to do there is to enable it and just filter those tests out for the time being. But that's a separate topic... It sounds reasonable to me to just use the touch failure in this particular case. I'll post a v2 with that tweak next week. > OTOH with statmount landing soonish, perhaps we ought to apply for a new > SB_SHUTDOWN state flag for it to export? > Perhaps worth a discussion..? The flipside I suppose is that shutdown has historically been a rather hacky, informalized thing with inconsistent behavior across fs' simply because it's a last ditch failsafe technique that we hope should never happen. Is it worth trying to generalize/formalize/document something that is basically a "has my filesystem crashed?" check..? We do have the vfs GOINGDOWN ioctl. I wonder if something like a new flag for a nomodify/check goingdown mode or something that would return whether a shutdown would occur or already has would be sufficient... hm? Brian > --D > > > Brian > > > > tests/generic/459 | 30 +++++++++++++++++++++++------- > > 1 file changed, 23 insertions(+), 7 deletions(-) > > > > diff --git a/tests/generic/459 b/tests/generic/459 > > index 4dd7a43b..d0c48325 100755 > > --- a/tests/generic/459 > > +++ b/tests/generic/459 > > @@ -57,6 +57,26 @@ origpsize=200 > > virtsize=300 > > newpsize=300 > > > > +# Check whether the filesystem has shutdown or remounted read-only. Behavior can > > +# differ based on filesystem and configuration. Some fs' may not have remounted > > +# without an additional write while others may have shutdown but do not > > +# necessarily reflect read-only state in the mount options. Check both here to > > +# cover the various scenarios. > > +is_shutdown_or_ro() > > +{ > > + ro=0 > > + > > + # if the fs has not shutdown, this may help trigger a remount-ro > > + touch $SCRATCH_MNT/newfile 2>&1 | \ > > + grep "Read-only file system" > /dev/null > > + [ $? == 0 ] && ro=1 > > + > > + _fs_options /dev/mapper/$vgname-$snapname | grep -w "ro" > /dev/null > > + [ $? == 0 ] && ro=1 > > + > > + echo $ro > > +} > > + > > # Ensure we have enough disk space > > _scratch_mkfs_sized $((350 * 1024 * 1024)) >>$seqres.full 2>&1 > > > > @@ -113,13 +133,9 @@ ret=$? > > # - The filesystem stays in Read-Write mode, but can be frozen/thawed > > # without getting stuck. > > if [ $ret -ne 0 ]; then > > - # freeze failed, filesystem should reject further writes and remount > > - # as readonly. Sometimes the previous write process won't trigger > > - # ro-remount, e.g. on ext3/4, do additional touch here to make sure > > - # filesystems see the metadata I/O error. > > - touch $SCRATCH_MNT/newfile >/dev/null 2>&1 > > - ISRO=$(_fs_options /dev/mapper/$vgname-$snapname | grep -w "ro") > > - if [ -n "$ISRO" ]; then > > + # freeze failed, filesystem should reject further writes > > + ISRO=`is_shutdown_or_ro` > > + if [ $ISRO == 1 ]; then > > echo "Test OK" > > else > > echo "Freeze failed and FS isn't Read-Only. Test Failed" > > -- > > 2.41.0 > > > > >
diff --git a/tests/generic/459 b/tests/generic/459 index 4dd7a43b..d0c48325 100755 --- a/tests/generic/459 +++ b/tests/generic/459 @@ -57,6 +57,26 @@ origpsize=200 virtsize=300 newpsize=300 +# Check whether the filesystem has shutdown or remounted read-only. Behavior can +# differ based on filesystem and configuration. Some fs' may not have remounted +# without an additional write while others may have shutdown but do not +# necessarily reflect read-only state in the mount options. Check both here to +# cover the various scenarios. +is_shutdown_or_ro() +{ + ro=0 + + # if the fs has not shutdown, this may help trigger a remount-ro + touch $SCRATCH_MNT/newfile 2>&1 | \ + grep "Read-only file system" > /dev/null + [ $? == 0 ] && ro=1 + + _fs_options /dev/mapper/$vgname-$snapname | grep -w "ro" > /dev/null + [ $? == 0 ] && ro=1 + + echo $ro +} + # Ensure we have enough disk space _scratch_mkfs_sized $((350 * 1024 * 1024)) >>$seqres.full 2>&1 @@ -113,13 +133,9 @@ ret=$? # - The filesystem stays in Read-Write mode, but can be frozen/thawed # without getting stuck. if [ $ret -ne 0 ]; then - # freeze failed, filesystem should reject further writes and remount - # as readonly. Sometimes the previous write process won't trigger - # ro-remount, e.g. on ext3/4, do additional touch here to make sure - # filesystems see the metadata I/O error. - touch $SCRATCH_MNT/newfile >/dev/null 2>&1 - ISRO=$(_fs_options /dev/mapper/$vgname-$snapname | grep -w "ro") - if [ -n "$ISRO" ]; then + # freeze failed, filesystem should reject further writes + ISRO=`is_shutdown_or_ro` + if [ $ISRO == 1 ]; then echo "Test OK" else echo "Freeze failed and FS isn't Read-Only. Test Failed"
generic/459 occasionally fails on bcachefs because the deliberately induced I/O errors caused by exhausting the overprovisioned thin pool can lead to filesystem shutdown. This test considers this expected behavior on certain fs', but only checks for the ext4 remount read-only behavior. bcachefs does a similar emergency read-only transition in response to certain I/O errors, but it behaves more similar to an XFS shutdown and doesn't necessarily reflect "ro" state in the mount table (unless induced by userspace). Since the test already runs a touch command to help trigger the ext4 error handling sequence, this can be tweaked to serve double duty and also more accurately detect read-only status on bcachefs. Refactor into a small helper, check for an EROFS return to the touch command, and consider the fs read-only if either that or the mount entry check indicates it. Signed-off-by: Brian Foster <bfoster@redhat.com> --- Something I realized when writing up the commit log is that the EROFS check doesn't technically cover XFS, which IIRC returns EIO in response to any sorts of writes once the fs has shutdown. I'm not sure this matters currently because XFS doesn't shutdown due to the default behavior to retry failed I/Os, but technically if XFS were configured to not retry I/O errors and go right to permanent failure, I suspect it would fail this test in the same way bcachefs does. That could be addressed fairly easily by also checking for EIO error message output, or just assuming touch failure == shutdown, etc. I don't have much preference on that, so thoughts appreciated. Brian tests/generic/459 | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-)