[V2] Btrfs: disable online scrub repair on ro cases

Message ID	1449251884-24135-1-git-send-email-bo.li.liu@oracle.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-btrfs-owner@kernel.org> From: Liu Bo <bo.li.liu@oracle.com> To: linux-btrfs@vger.kernel.org Cc: codebird@birds-are-nice.me Subject: [PATCH V2] Btrfs: disable online scrub repair on ro cases Date: Fri, 4 Dec 2015 09:58:04 -0800 Message-Id: <1449251884-24135-1-git-send-email-bo.li.liu@oracle.com> In-Reply-To: <1449190558-11205-1-git-send-email-bo.li.liu@oracle.com> References: <1449190558-11205-1-git-send-email-bo.li.liu@oracle.com> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk

Message ID

1449251884-24135-1-git-send-email-bo.li.liu@oracle.com (mailing list archive)

State

New, archived

Headers

From: Liu Bo <bo.li.liu@oracle.com>
To: linux-btrfs@vger.kernel.org
Cc: codebird@birds-are-nice.me
Subject: [PATCH V2] Btrfs: disable online scrub repair on ro cases
Date: Fri,  4 Dec 2015 09:58:04 -0800
Message-Id: <1449251884-24135-1-git-send-email-bo.li.liu@oracle.com>
In-Reply-To: <1449190558-11205-1-git-send-email-bo.li.liu@oracle.com>
References: <1449190558-11205-1-git-send-email-bo.li.liu@oracle.com>
Sender: linux-btrfs-owner@vger.kernel.org
Precedence: bulk

Commit Message

Liu Bo Dec. 4, 2015, 5:58 p.m. UTC

This disables repair process on ro cases as it can cause system
to be unresponsive on the ASSERT() in repair_io_failure().

This can happen when scrub is running and a hardware error pops up,
we should fallback to ro mounts gracefully instead of being unresponsive.

Reported-by: Codebird <codebird@birds-are-nice.me>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
v2: Get @fs_info from a real pointer instead of a confusing-name u64 root.

 fs/btrfs/scrub.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

David Sterba Dec. 7, 2015, 2:37 p.m. UTC | #1

On Fri, Dec 04, 2015 at 09:58:04AM -0800, Liu Bo wrote:
> This disables repair process on ro cases as it can cause system
> to be unresponsive on the ASSERT() in repair_io_failure().
> 
> This can happen when scrub is running and a hardware error pops up,
> we should fallback to ro mounts gracefully instead of being unresponsive.

So this will also report the error as uncorrectable. This might be a bit
misleading, if a device error happens first and then some potentially
corectable errors are detected. This could be accounted as 'unverified'
error, that has closet maning.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo Dec. 7, 2015, 6:26 p.m. UTC | #2

On Mon, Dec 07, 2015 at 03:37:43PM +0100, David Sterba wrote:
> On Fri, Dec 04, 2015 at 09:58:04AM -0800, Liu Bo wrote:
> > This disables repair process on ro cases as it can cause system
> > to be unresponsive on the ASSERT() in repair_io_failure().
> > 
> > This can happen when scrub is running and a hardware error pops up,
> > we should fallback to ro mounts gracefully instead of being unresponsive.
> 
> So this will also report the error as uncorrectable. This might be a bit
> misleading, if a device error happens first and then some potentially
> corectable errors are detected. This could be accounted as 'unverified'
> error, that has closet maning.

Make sense, we can do
if (ret < 0 && ret == -EROFS)
	spin_lock();
	unverified++;
	spin_unlock()

However, in scrub_fixup_nodatasum() all errors including ENOMEM of path
allocation and failure of trans are interpreted to 'uncorrectable', So I
wander it means this 'uncorrectable' is only valid in this scrub process?

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba Jan. 5, 2016, 1:54 p.m. UTC | #3

On Mon, Dec 07, 2015 at 10:26:05AM -0800, Liu Bo wrote:
> On Mon, Dec 07, 2015 at 03:37:43PM +0100, David Sterba wrote:
> > On Fri, Dec 04, 2015 at 09:58:04AM -0800, Liu Bo wrote:
> > > This disables repair process on ro cases as it can cause system
> > > to be unresponsive on the ASSERT() in repair_io_failure().
> > > 
> > > This can happen when scrub is running and a hardware error pops up,
> > > we should fallback to ro mounts gracefully instead of being unresponsive.
> > 
> > So this will also report the error as uncorrectable. This might be a bit
> > misleading, if a device error happens first and then some potentially
> > corectable errors are detected. This could be accounted as 'unverified'
> > error, that has closet maning.
> 
> Make sense, we can do
> if (ret < 0 && ret == -EROFS)
> 	spin_lock();
> 	unverified++;
> 	spin_unlock()
> 
> However, in scrub_fixup_nodatasum() all errors including ENOMEM of path
> allocation and failure of trans are interpreted to 'uncorrectable', So I
> wander it means this 'uncorrectable' is only valid in this scrub process?

I'm not sure we have a proper definition of the various stats. My user
expectation is that 'uncorrectable' refers to permament errors, so we
should try to match the type of error everywhere.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 2907a77..cb8a4e0 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -682,11 +682,14 @@  static int scrub_fixup_readpage(u64 inum, u64 offset, u64 root, void *fixup_ctx)
 	struct btrfs_root *local_root;
 	int srcu_index;
 
+	fs_info = fixup->root->fs_info;
+	if (fs_info->sb->s_flags & MS_RDONLY)
+		return -EROFS;
+
 	key.objectid = root;
 	key.type = BTRFS_ROOT_ITEM_KEY;
 	key.offset = (u64)-1;
 
-	fs_info = fixup->root->fs_info;
 	srcu_index = srcu_read_lock(&fs_info->subvol_srcu);
 
 	local_root = btrfs_read_fs_root_no_name(fs_info, &key);

[V2] Btrfs: disable online scrub repair on ro cases

Commit Message

Comments

Patch