diff mbox series

[3/4] btrfs: introduce new "rescue=ignoremetacsums" mount option

Message ID f6b9b9037ee7912ed2081da9c4b05fd367c9e8f8.1718082585.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: rescue= mount options enhancement to support interrupted csum conversion | expand

Commit Message

Qu Wenruo June 11, 2024, 5:21 a.m. UTC
This patch introduces "rescue=ignoremetacsums" to ignore metadata csums,
meanwhile all the other metadata sanity checks are still kept as is.

This new mount option is mostly to allow the kernel to mount an
interrupted checksum conversion (at the metadata csum overwrite stage).

And since the main part of metadata sanity checks is inside
tree-checker, we shouldn't lose much safety, and the new mount option is
rescue mount option it requires full read-only mount.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/bio.c       |  2 +-
 fs/btrfs/disk-io.c   | 16 ++++++++++------
 fs/btrfs/file-item.c |  2 +-
 fs/btrfs/fs.h        |  3 ++-
 fs/btrfs/messages.c  |  2 +-
 fs/btrfs/super.c     | 13 ++++++++++++-
 fs/btrfs/zoned.c     |  2 +-
 7 files changed, 28 insertions(+), 12 deletions(-)

Comments

David Sterba June 12, 2024, 7:38 p.m. UTC | #1
On Tue, Jun 11, 2024 at 02:51:37PM +0930, Qu Wenruo wrote:
> This patch introduces "rescue=ignoremetacsums" to ignore metadata csums,
> meanwhile all the other metadata sanity checks are still kept as is.
> 
> This new mount option is mostly to allow the kernel to mount an
> interrupted checksum conversion (at the metadata csum overwrite stage).
> 
> And since the main part of metadata sanity checks is inside
> tree-checker, we shouldn't lose much safety, and the new mount option is
> rescue mount option it requires full read-only mount.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -367,6 +367,7 @@ int btrfs_validate_extent_buffer(struct extent_buffer *eb,
>  	u8 result[BTRFS_CSUM_SIZE];
>  	const u8 *header_csum;
>  	int ret = 0;
> +	bool ignore_csum = btrfs_test_opt(fs_info, IGNOREMETACSUMS);

const

> --- a/fs/btrfs/messages.c
> +++ b/fs/btrfs/messages.c
> @@ -20,7 +20,7 @@ static const char fs_state_chars[] = {
>  	[BTRFS_FS_STATE_TRANS_ABORTED]		= 'A',
>  	[BTRFS_FS_STATE_DEV_REPLACING]		= 'R',
>  	[BTRFS_FS_STATE_DUMMY_FS_INFO]		= 0,
> -	[BTRFS_FS_STATE_NO_CSUMS]		= 'C',
> +	[BTRFS_FS_STATE_NO_DATA_CSUMS]		= 'C',

There should be the status also when the metadata checksums are not
validated, the letters are arbitrary but should reflect the state if
possible, I'd suggest to use 'S' here.

What I'm missing is the sysfs.c update, all options supported by rescue
need to be listed in rescue_opts.
Qu Wenruo June 13, 2024, 9:28 p.m. UTC | #2
在 2024/6/13 05:08, David Sterba 写道:
> On Tue, Jun 11, 2024 at 02:51:37PM +0930, Qu Wenruo wrote:
>> This patch introduces "rescue=ignoremetacsums" to ignore metadata csums,
>> meanwhile all the other metadata sanity checks are still kept as is.
>>
>> This new mount option is mostly to allow the kernel to mount an
>> interrupted checksum conversion (at the metadata csum overwrite stage).
>>
>> And since the main part of metadata sanity checks is inside
>> tree-checker, we shouldn't lose much safety, and the new mount option is
>> rescue mount option it requires full read-only mount.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -367,6 +367,7 @@ int btrfs_validate_extent_buffer(struct extent_buffer *eb,
>>   	u8 result[BTRFS_CSUM_SIZE];
>>   	const u8 *header_csum;
>>   	int ret = 0;
>> +	bool ignore_csum = btrfs_test_opt(fs_info, IGNOREMETACSUMS);
>
> const
>
>> --- a/fs/btrfs/messages.c
>> +++ b/fs/btrfs/messages.c
>> @@ -20,7 +20,7 @@ static const char fs_state_chars[] = {
>>   	[BTRFS_FS_STATE_TRANS_ABORTED]		= 'A',
>>   	[BTRFS_FS_STATE_DEV_REPLACING]		= 'R',
>>   	[BTRFS_FS_STATE_DUMMY_FS_INFO]		= 0,
>> -	[BTRFS_FS_STATE_NO_CSUMS]		= 'C',
>> +	[BTRFS_FS_STATE_NO_DATA_CSUMS]		= 'C',
>
> There should be the status also when the metadata checksums are not
> validated, the letters are arbitrary but should reflect the state if
> possible, I'd suggest to use 'S' here.

I'd prefer to change the NO_DATA_CSUMS one to use 'D' or 'd' (for data),
meanwhile for metadata we go 'M' or 'm'.

But on the other hand, I do not think data/meta csum ignoring really
deserves a dedicated state char.

It's not really that special compared to trans aborted or dummy fs.
(The same for dev-replacing)

I can add it for now, but I believe we need some better discussion on
the fs_state_chars.

>
> What I'm missing is the sysfs.c update, all options supported by rescue
> need to be listed in rescue_opts.
>
That would be updated in the next update.

Thanks,
Qu
David Sterba June 16, 2024, 6:17 p.m. UTC | #3
On Fri, Jun 14, 2024 at 06:58:20AM +0930, Qu Wenruo wrote:
> 
> 
> 在 2024/6/13 05:08, David Sterba 写道:
> > On Tue, Jun 11, 2024 at 02:51:37PM +0930, Qu Wenruo wrote:
> >> This patch introduces "rescue=ignoremetacsums" to ignore metadata csums,
> >> meanwhile all the other metadata sanity checks are still kept as is.
> >>
> >> This new mount option is mostly to allow the kernel to mount an
> >> interrupted checksum conversion (at the metadata csum overwrite stage).
> >>
> >> And since the main part of metadata sanity checks is inside
> >> tree-checker, we shouldn't lose much safety, and the new mount option is
> >> rescue mount option it requires full read-only mount.
> >>
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >
> >> --- a/fs/btrfs/disk-io.c
> >> +++ b/fs/btrfs/disk-io.c
> >> @@ -367,6 +367,7 @@ int btrfs_validate_extent_buffer(struct extent_buffer *eb,
> >>   	u8 result[BTRFS_CSUM_SIZE];
> >>   	const u8 *header_csum;
> >>   	int ret = 0;
> >> +	bool ignore_csum = btrfs_test_opt(fs_info, IGNOREMETACSUMS);
> >
> > const
> >
> >> --- a/fs/btrfs/messages.c
> >> +++ b/fs/btrfs/messages.c
> >> @@ -20,7 +20,7 @@ static const char fs_state_chars[] = {
> >>   	[BTRFS_FS_STATE_TRANS_ABORTED]		= 'A',
> >>   	[BTRFS_FS_STATE_DEV_REPLACING]		= 'R',
> >>   	[BTRFS_FS_STATE_DUMMY_FS_INFO]		= 0,
> >> -	[BTRFS_FS_STATE_NO_CSUMS]		= 'C',
> >> +	[BTRFS_FS_STATE_NO_DATA_CSUMS]		= 'C',
> >
> > There should be the status also when the metadata checksums are not
> > validated, the letters are arbitrary but should reflect the state if
> > possible, I'd suggest to use 'S' here.
> 
> I'd prefer to change the NO_DATA_CSUMS one to use 'D' or 'd' (for data),
> meanwhile for metadata we go 'M' or 'm'.

Changing would break backward compatibility, now it's part of user
visible interface. It's not an ABI or API but at least we should do such
changes without considering the consequences.

> But on the other hand, I do not think data/meta csum ignoring really
> deserves a dedicated state char.
> 
> It's not really that special compared to trans aborted or dummy fs.
> (The same for dev-replacing)

The idea of the descriptors is to make it visible that the filesystem is
in some unusual state, skipping checksum verification can make a
difference when reading blocks that would normally not pass the check.
Qu Wenruo June 16, 2024, 9:24 p.m. UTC | #4
在 2024/6/17 03:47, David Sterba 写道:
> On Fri, Jun 14, 2024 at 06:58:20AM +0930, Qu Wenruo wrote:
>>
>>
>> 在 2024/6/13 05:08, David Sterba 写道:
>>> On Tue, Jun 11, 2024 at 02:51:37PM +0930, Qu Wenruo wrote:
>>>> This patch introduces "rescue=ignoremetacsums" to ignore metadata csums,
>>>> meanwhile all the other metadata sanity checks are still kept as is.
>>>>
>>>> This new mount option is mostly to allow the kernel to mount an
>>>> interrupted checksum conversion (at the metadata csum overwrite stage).
>>>>
>>>> And since the main part of metadata sanity checks is inside
>>>> tree-checker, we shouldn't lose much safety, and the new mount option is
>>>> rescue mount option it requires full read-only mount.
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>
>>>> --- a/fs/btrfs/disk-io.c
>>>> +++ b/fs/btrfs/disk-io.c
>>>> @@ -367,6 +367,7 @@ int btrfs_validate_extent_buffer(struct extent_buffer *eb,
>>>>    	u8 result[BTRFS_CSUM_SIZE];
>>>>    	const u8 *header_csum;
>>>>    	int ret = 0;
>>>> +	bool ignore_csum = btrfs_test_opt(fs_info, IGNOREMETACSUMS);
>>>
>>> const
>>>
>>>> --- a/fs/btrfs/messages.c
>>>> +++ b/fs/btrfs/messages.c
>>>> @@ -20,7 +20,7 @@ static const char fs_state_chars[] = {
>>>>    	[BTRFS_FS_STATE_TRANS_ABORTED]		= 'A',
>>>>    	[BTRFS_FS_STATE_DEV_REPLACING]		= 'R',
>>>>    	[BTRFS_FS_STATE_DUMMY_FS_INFO]		= 0,
>>>> -	[BTRFS_FS_STATE_NO_CSUMS]		= 'C',
>>>> +	[BTRFS_FS_STATE_NO_DATA_CSUMS]		= 'C',
>>>
>>> There should be the status also when the metadata checksums are not
>>> validated, the letters are arbitrary but should reflect the state if
>>> possible, I'd suggest to use 'S' here.
>>
>> I'd prefer to change the NO_DATA_CSUMS one to use 'D' or 'd' (for data),
>> meanwhile for metadata we go 'M' or 'm'.
>
> Changing would break backward compatibility, now it's part of user
> visible interface. It's not an ABI or API but at least we should do such
> changes without considering the consequences.
>
>> But on the other hand, I do not think data/meta csum ignoring really
>> deserves a dedicated state char.
>>
>> It's not really that special compared to trans aborted or dummy fs.
>> (The same for dev-replacing)
>
> The idea of the descriptors is to make it visible that the filesystem is
> in some unusual state, skipping checksum verification can make a
> difference when reading blocks that would normally not pass the check.

On the other hand, I have already changed the metadata csum mismatch output.
It should be enough to know if we're in a skipping-metadata-csum state.

Furthermore, all of these are in rescue mode with forced RO, which
should already be a thing to mention during report/debugging.

That's why I do not think skipping data/metadata csums really needs such
dedicated handling.

Thanks,
Qu
diff mbox series

Patch

diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index 477f350a8bd0..07bc19aaa864 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -732,7 +732,7 @@  static bool btrfs_submit_chunk(struct btrfs_bio *bbio, int mirror_num)
 		 * point, so they are handled as part of the no-checksum case.
 		 */
 		if (inode && !(inode->flags & BTRFS_INODE_NODATASUM) &&
-		    !test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state) &&
+		    !test_bit(BTRFS_FS_STATE_NO_DATA_CSUMS, &fs_info->fs_state) &&
 		    !btrfs_is_data_reloc_root(inode->root)) {
 			if (should_async_write(bbio) &&
 			    btrfs_wq_submit_bio(bbio, bioc, &smap, mirror_num))
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 78a11f9357ed..74ee9f335dbc 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -367,6 +367,7 @@  int btrfs_validate_extent_buffer(struct extent_buffer *eb,
 	u8 result[BTRFS_CSUM_SIZE];
 	const u8 *header_csum;
 	int ret = 0;
+	bool ignore_csum = btrfs_test_opt(fs_info, IGNOREMETACSUMS);
 
 	ASSERT(check);
 
@@ -399,13 +400,16 @@  int btrfs_validate_extent_buffer(struct extent_buffer *eb,
 
 	if (memcmp(result, header_csum, csum_size) != 0) {
 		btrfs_warn_rl(fs_info,
-"checksum verify failed on logical %llu mirror %u wanted " CSUM_FMT " found " CSUM_FMT " level %d",
+"checksum verify failed on logical %llu mirror %u wanted " CSUM_FMT " found " CSUM_FMT " level %d%s",
 			      eb->start, eb->read_mirror,
 			      CSUM_FMT_VALUE(csum_size, header_csum),
 			      CSUM_FMT_VALUE(csum_size, result),
-			      btrfs_header_level(eb));
-		ret = -EUCLEAN;
-		goto out;
+			      btrfs_header_level(eb),
+			      ignore_csum ? ", ignoring" : "");
+		if (!ignore_csum) {
+			ret = -EUCLEAN;
+			goto out;
+		}
 	}
 
 	if (found_level != check->level) {
@@ -2135,7 +2139,7 @@  static int load_global_roots_objectid(struct btrfs_root *tree_root,
 	/* If we have IGNOREDATACSUMS skip loading these roots. */
 	if (objectid == BTRFS_CSUM_TREE_OBJECTID &&
 	    btrfs_test_opt(fs_info, IGNOREDATACSUMS)) {
-		set_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state);
+		set_bit(BTRFS_FS_STATE_NO_DATA_CSUMS, &fs_info->fs_state);
 		return 0;
 	}
 
@@ -2188,7 +2192,7 @@  static int load_global_roots_objectid(struct btrfs_root *tree_root,
 
 	if (!found || ret) {
 		if (objectid == BTRFS_CSUM_TREE_OBJECTID)
-			set_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state);
+			set_bit(BTRFS_FS_STATE_NO_DATA_CSUMS, &fs_info->fs_state);
 
 		if (!btrfs_test_opt(fs_info, IGNOREBADROOTS))
 			ret = ret ? ret : -ENOENT;
diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 55703c833f3d..6c7713fb9a75 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -353,7 +353,7 @@  blk_status_t btrfs_lookup_bio_sums(struct btrfs_bio *bbio)
 	u32 bio_offset = 0;
 
 	if ((inode->flags & BTRFS_INODE_NODATASUM) ||
-	    test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state))
+	    test_bit(BTRFS_FS_STATE_NO_DATA_CSUMS, &fs_info->fs_state))
 		return BLK_STS_OK;
 
 	/*
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index 18e0d3539496..fcef7b81f9fd 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -98,7 +98,7 @@  enum {
 	/* The btrfs_fs_info created for self-tests */
 	BTRFS_FS_STATE_DUMMY_FS_INFO,
 
-	BTRFS_FS_STATE_NO_CSUMS,
+	BTRFS_FS_STATE_NO_DATA_CSUMS,
 
 	/* Indicates there was an error cleaning up a log tree. */
 	BTRFS_FS_STATE_LOG_CLEANUP_ERROR,
@@ -224,6 +224,7 @@  enum {
 	BTRFS_MOUNT_IGNOREDATACSUMS		= (1UL << 28),
 	BTRFS_MOUNT_NODISCARD			= (1UL << 29),
 	BTRFS_MOUNT_NOSPACECACHE		= (1UL << 30),
+	BTRFS_MOUNT_IGNOREMETACSUMS		= (1UL << 31),
 };
 
 /*
diff --git a/fs/btrfs/messages.c b/fs/btrfs/messages.c
index 210d9c82e2ae..38a857aba446 100644
--- a/fs/btrfs/messages.c
+++ b/fs/btrfs/messages.c
@@ -20,7 +20,7 @@  static const char fs_state_chars[] = {
 	[BTRFS_FS_STATE_TRANS_ABORTED]		= 'A',
 	[BTRFS_FS_STATE_DEV_REPLACING]		= 'R',
 	[BTRFS_FS_STATE_DUMMY_FS_INFO]		= 0,
-	[BTRFS_FS_STATE_NO_CSUMS]		= 'C',
+	[BTRFS_FS_STATE_NO_DATA_CSUMS]		= 'C',
 	[BTRFS_FS_STATE_LOG_CLEANUP_ERROR]	= 'L',
 };
 
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 902423f2839c..386500b9b440 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -175,6 +175,7 @@  enum {
 	Opt_rescue_nologreplay,
 	Opt_rescue_ignorebadroots,
 	Opt_rescue_ignoredatacsums,
+	Opt_rescue_ignoremetacsums,
 	Opt_rescue_parameter_all,
 };
 
@@ -184,7 +185,9 @@  static const struct constant_table btrfs_parameter_rescue[] = {
 	{ "ignorebadroots", Opt_rescue_ignorebadroots },
 	{ "ibadroots", Opt_rescue_ignorebadroots },
 	{ "ignoredatacsums", Opt_rescue_ignoredatacsums },
+	{ "ignoremetacsums", Opt_rescue_ignoremetacsums},
 	{ "idatacsums", Opt_rescue_ignoredatacsums },
+	{ "imetacsums", Opt_rescue_ignoremetacsums},
 	{ "all", Opt_rescue_parameter_all },
 	{}
 };
@@ -570,8 +573,12 @@  static int btrfs_parse_param(struct fs_context *fc, struct fs_parameter *param)
 		case Opt_rescue_ignoredatacsums:
 			btrfs_set_opt(ctx->mount_opt, IGNOREDATACSUMS);
 			break;
+		case Opt_rescue_ignoremetacsums:
+			btrfs_set_opt(ctx->mount_opt, IGNOREMETACSUMS);
+			break;
 		case Opt_rescue_parameter_all:
 			btrfs_set_opt(ctx->mount_opt, IGNOREDATACSUMS);
+			btrfs_set_opt(ctx->mount_opt, IGNOREMETACSUMS);
 			btrfs_set_opt(ctx->mount_opt, IGNOREBADROOTS);
 			btrfs_set_opt(ctx->mount_opt, NOLOGREPLAY);
 			break;
@@ -646,7 +653,8 @@  bool btrfs_check_options(const struct btrfs_fs_info *info, unsigned long *mount_
 	if (!(flags & SB_RDONLY) &&
 	    (check_ro_option(info, *mount_opt, BTRFS_MOUNT_NOLOGREPLAY, "nologreplay") ||
 	     check_ro_option(info, *mount_opt, BTRFS_MOUNT_IGNOREBADROOTS, "ignorebadroots") ||
-	     check_ro_option(info, *mount_opt, BTRFS_MOUNT_IGNOREDATACSUMS, "ignoredatacsums")))
+	     check_ro_option(info, *mount_opt, BTRFS_MOUNT_IGNOREDATACSUMS, "ignoredatacsums") ||
+	     check_ro_option(info, *mount_opt, BTRFS_MOUNT_IGNOREMETACSUMS, "ignoremetacsums")))
 		ret = false;
 
 	if (btrfs_fs_compat_ro(info, FREE_SPACE_TREE) &&
@@ -1062,6 +1070,8 @@  static int btrfs_show_options(struct seq_file *seq, struct dentry *dentry)
 		print_rescue_option(seq, "ignorebadroots", &printed);
 	if (btrfs_test_opt(info, IGNOREDATACSUMS))
 		print_rescue_option(seq, "ignoredatacsums", &printed);
+	if (btrfs_test_opt(info, IGNOREMETACSUMS))
+		print_rescue_option(seq, "ignoremetacsums", &printed);
 	if (btrfs_test_opt(info, FLUSHONCOMMIT))
 		seq_puts(seq, ",flushoncommit");
 	if (btrfs_test_opt(info, DISCARD_SYNC))
@@ -1419,6 +1429,7 @@  static void btrfs_emit_options(struct btrfs_fs_info *info,
 	btrfs_info_if_set(info, old, USEBACKUPROOT, "trying to use backup root at mount time");
 	btrfs_info_if_set(info, old, IGNOREBADROOTS, "ignoring bad roots");
 	btrfs_info_if_set(info, old, IGNOREDATACSUMS, "ignoring data csums");
+	btrfs_info_if_set(info, old, IGNOREMETACSUMS, "ignoring meta csums");
 
 	btrfs_info_if_unset(info, old, NODATACOW, "setting datacow");
 	btrfs_info_if_unset(info, old, SSD, "not using ssd optimizations");
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 992a5b7756ca..386daea0ca0b 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1844,7 +1844,7 @@  void btrfs_finish_ordered_zoned(struct btrfs_ordered_extent *ordered)
 	 * here so that we don't attempt to log the csums later.
 	 */
 	if ((inode->flags & BTRFS_INODE_NODATASUM) ||
-	    test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state)) {
+	    test_bit(BTRFS_FS_STATE_NO_DATA_CSUMS, &fs_info->fs_state)) {
 		while ((sum = list_first_entry_or_null(&ordered->list,
 						       typeof(*sum), list))) {
 			list_del(&sum->list);