diff mbox

btrfs: scrub: set error stats when tree block spanning stripes

Message ID 981687019fcca4b4f7455b7a755e48888cbd24ff.1440510311.git.zhaolei@cn.fujitsu.com (mailing list archive)
State Accepted
Headers show

Commit Message

Zhaolei Aug. 25, 2015, 1:45 p.m. UTC
It is better to show error stats to user when we found tree block
spanning stripes.

On a btrfs created by old version of btrfs-convert:
Before patch:
  # btrfs scrub start -B /dev/vdh
  scrub done for 8b342d35-2904-41ab-b3cb-2f929709cf47
          scrub started at Tue Aug 25 21:19:09 2015 and finished after 00:00:00
          total bytes scrubbed: 53.54MiB with 0 errors
  # dmesg
  ...
  [  128.711434] BTRFS error (device vdh): scrub: tree block 27054080 spanning stripes, ignored. logical=27000832
  [  128.712744] BTRFS error (device vdh): scrub: tree block 27054080 spanning stripes, ignored. logical=27066368
  ...

After patch:
  # btrfs scrub start -B /dev/vdh
  scrub done for ff7f844b-7a4e-4b1a-88a9-8252ab25be1b
          scrub started at Tue Aug 25 21:42:29 2015 and finished after 00:00:00
          total bytes scrubbed: 53.60MiB with 2 errors
          error details:
          corrected errors: 0, uncorrectable errors: 2, unverified errors: 0
  ERROR: There are uncorrectable errors.
  # dmesg
  ...omit...
  #

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
---
 fs/btrfs/scrub.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

David Sterba Aug. 26, 2015, 5:19 p.m. UTC | #1
On Tue, Aug 25, 2015 at 09:45:27PM +0800, Zhao Lei wrote:
> It is better to show error stats to user when we found tree block
> spanning stripes.
> 
> On a btrfs created by old version of btrfs-convert:
> Before patch:
>   # btrfs scrub start -B /dev/vdh
>   scrub done for 8b342d35-2904-41ab-b3cb-2f929709cf47
>           scrub started at Tue Aug 25 21:19:09 2015 and finished after 00:00:00
>           total bytes scrubbed: 53.54MiB with 0 errors
>   # dmesg
>   ...
>   [  128.711434] BTRFS error (device vdh): scrub: tree block 27054080 spanning stripes, ignored. logical=27000832
>   [  128.712744] BTRFS error (device vdh): scrub: tree block 27054080 spanning stripes, ignored. logical=27066368
>   ...
> 
> After patch:
>   # btrfs scrub start -B /dev/vdh
>   scrub done for ff7f844b-7a4e-4b1a-88a9-8252ab25be1b
>           scrub started at Tue Aug 25 21:42:29 2015 and finished after 00:00:00
>           total bytes scrubbed: 53.60MiB with 2 errors
>           error details:
>           corrected errors: 0, uncorrectable errors: 2, unverified errors: 0
>   ERROR: There are uncorrectable errors.

I agree that reporting that is a good thing so the syslog is consistent
with the status output. Is the error really uncorrectable? I'd have to
re-read the reports and root cause analysis again, but can we do it at
least via checker?

The uncorrectable errors are especially worrisome so I'd like to know
what kind of answer should we give.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhaolei Aug. 27, 2015, 1:44 a.m. UTC | #2
Hi, David Sterba

> -----Original Message-----
> From: David Sterba [mailto:dsterba@suse.cz]
> Sent: Thursday, August 27, 2015 1:20 AM
> To: Zhao Lei <zhaolei@cn.fujitsu.com>
> Cc: linux-btrfs@vger.kernel.org
> Subject: Re: [PATCH] btrfs: scrub: set error stats when tree block spanning
> stripes
> 
> On Tue, Aug 25, 2015 at 09:45:27PM +0800, Zhao Lei wrote:
> > It is better to show error stats to user when we found tree block
> > spanning stripes.
> >
> > On a btrfs created by old version of btrfs-convert:
> > Before patch:
> >   # btrfs scrub start -B /dev/vdh
> >   scrub done for 8b342d35-2904-41ab-b3cb-2f929709cf47
> >           scrub started at Tue Aug 25 21:19:09 2015 and finished after
> 00:00:00
> >           total bytes scrubbed: 53.54MiB with 0 errors
> >   # dmesg
> >   ...
> >   [  128.711434] BTRFS error (device vdh): scrub: tree block 27054080
> spanning stripes, ignored. logical=27000832
> >   [  128.712744] BTRFS error (device vdh): scrub: tree block 27054080
> spanning stripes, ignored. logical=27066368
> >   ...
> >
> > After patch:
> >   # btrfs scrub start -B /dev/vdh
> >   scrub done for ff7f844b-7a4e-4b1a-88a9-8252ab25be1b
> >           scrub started at Tue Aug 25 21:42:29 2015 and finished after
> 00:00:00
> >           total bytes scrubbed: 53.60MiB with 2 errors
> >           error details:
> >           corrected errors: 0, uncorrectable errors: 2, unverified errors: 0
> >   ERROR: There are uncorrectable errors.
> 
> I agree that reporting that is a good thing so the syslog is consistent with the
> status output. Is the error really uncorrectable? I'd have to re-read the reports
> and root cause analysis again, but can we do it at least via checker?
> 
This type of error-filesystem is created by old btrfs-convert,
Reported by: Chris Murphy <lists@colorremedies.com>
In maillist, with title:
[BUG] kernel BUG at fs/btrfs/scrub.c:1956!, when scrubbing freshly convert

Then we get the reason: It is caused by bug in btrfs-convert.
Detail is in maillist titled:
RE: [BUG] kernel BUG at fs/btrfs/scrub.c:1956!, when scrubbing freshly converted ext
In 2015-7-23

And fixed btrfs-progs, to avoid creating such a wrong filesystem
with btrfs-convert, and report this type of error in fsck.
With patch titled:
[PATCH 0/4] Metadata crossing stripe boundary fixes
[PATCH 1/4] btrfs: print-tree: print stripe len of a chunk
[PATCH 2/4] btrfs: fsck: Check if a metadata tree block crossing stripe boundary
[PATCH 3/4] btrfs: extent-tree: Avoid allocating tree block that cross stripe boundar
[PATCH 4/4] btrfs: convert: Avoid allocating metadata extent crossing stripe boundary
In 2015-7-23

Also fixed kernel code to avoid panic with:
[PATCH] btrfs: Fix scrub panic when leaf accross stripes in btrfs maillist.
In 2015-7-23

Above patchs are in current btrfs-progs and integration-4.3
branch.

But it is still uncorrectable because neither kernel nor btrfsck can fix
it(only can report).

In short, it is caused by old version of btrfs-convert, and already fixed,
and this patch is used to make scrub report it in output.

Thanks
Zhaolei

> The uncorrectable errors are especially worrisome so I'd like to know what kind
> of answer should we give.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Sept. 3, 2015, 1:17 p.m. UTC | #3
On Thu, Aug 27, 2015 at 09:44:05AM +0800, Zhao Lei wrote:
[...]

Thanks for the links and summary!

> But it is still uncorrectable because neither kernel nor btrfsck can fix
> it(only can report).

I'm curious if it' is possible to write code that would fix the problem,
eg. te relocate or shift the blocks. The 'btrfs rescue' command group is
meant for this type of targeted fixes, unless it's easy to integrate it
to the check & repair itself.

> In short, it is caused by old version of btrfs-convert, and already fixed,
> and this patch is used to make scrub report it in output.

I assume there are people affected by this problem so it would be good
to have a repair tool.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo Sept. 4, 2015, 3:21 a.m. UTC | #4
Hi, David and Zhao,

David Sterba wrote on 2015/09/03 15:17 +0200:
> On Thu, Aug 27, 2015 at 09:44:05AM +0800, Zhao Lei wrote:
> [...]
>
> Thanks for the links and summary!
>
>> But it is still uncorrectable because neither kernel nor btrfsck can fix
>> it(only can report).
>
> I'm curious if it' is possible to write code that would fix the problem,
> eg. te relocate or shift the blocks. The 'btrfs rescue' command group is
> meant for this type of targeted fixes, unless it's easy to integrate it
> to the check & repair itself.

I'm happy and willing to add the relocating ability for btrfs-progs.
As that will provide a lot of flexibility, while the code should be 
quite easy if I do it in a NOCOW method.
(Find a extent, copy it to other place, follow backref and modify)

The only concern is, does it mean that we need to implement all the 
features of btrfs kernel in user space tool?
And will it make the codes duplicated between kernel and progs?

Thanks,
Qu

>
>> In short, it is caused by old version of btrfs-convert, and already fixed,
>> and this patch is used to make scrub report it in output.
>
> I assume there are people affected by this problem so it would be good
> to have a repair tool.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba Sept. 7, 2015, 12:37 p.m. UTC | #5
On Fri, Sep 04, 2015 at 11:21:05AM +0800, Qu Wenruo wrote:
> I'm happy and willing to add the relocating ability for btrfs-progs.
> As that will provide a lot of flexibility, while the code should be 
> quite easy if I do it in a NOCOW method.
> (Find a extent, copy it to other place, follow backref and modify)
> 
> The only concern is, does it mean that we need to implement all the 
> features of btrfs kernel in user space tool?

I don't think we need a 1:1 feature parity. We might end up doing a one
chunk relocation, but relocating an extent would be enough for a start.

Doing the operations in userspace is easier in some respects as the data
are not live, we can do different trade-offs, utilize readahead etc.
Some operations might end up significantly faster and might be used as
one-time actions eg. when converting multi-terabyte filesystem to
another raid type. But this is a long-term goal.

> And will it make the codes duplicated between kernel and progs?

At this moment the code is not unified and we have to copy the code from
kernel where needed. But as far as the code does not diverge, it'll be
easier to use exact kernel sources in progs. Current versions of the
common sources are not close enough, so we'd have to unify them first.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index c69c75e..81482e4 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3025,6 +3025,9 @@  static noinline_for_stack int scrub_raid56_parity(struct scrub_ctx *sctx,
 			     logic_start + map->stripe_len)) {
 				btrfs_err(fs_info, "scrub: tree block %llu spanning stripes, ignored. logical=%llu",
 					  key.objectid, logic_start);
+				spin_lock(&sctx->stat_lock);
+				sctx->stat.uncorrectable_errors++;
+				spin_unlock(&sctx->stat_lock);
 				goto next;
 			}
 again:
@@ -3374,6 +3377,9 @@  static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 					   "scrub: tree block %llu spanning "
 					   "stripes, ignored. logical=%llu",
 				       key.objectid, logical);
+				spin_lock(&sctx->stat_lock);
+				sctx->stat.uncorrectable_errors++;
+				spin_unlock(&sctx->stat_lock);
 				goto next;
 			}