Message ID | d935c0e0-c2c8-a0ec-bb07-5c3879dd1be0@gmx.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> >>>>> TL;DR It seems as regression in 4.17, but I managed to find a > >>>>> workaround to make filesystem rw mountable again. > >>>>> > >>>>> Kernel built from tag v4.17-rc1 > >>>>> btrfs-progs 4.16 > >>>>> > >>>>> Tonight two my machines (PC (ECC RAM) and laptop(non-ECC RAM)) were > >>>>> doing usual weekly balance with this command via cron: > >>>>> btrfs balance start -musage=50 -dusage=50 <mountpoint> > >>>>> Both machines run same kernel version. > >>>>> > >>>>> On PC that caused root and "data" filesystems to go readonly. Root > >>>>> is on an SSD with data single and metadata DUP, "data" filesystem > >>>>> is on 2 HDDs with RAID1 for data and metadata. > >>>>> > >>>>> On laptop only /home went ro, it's on NVMe SSD with data single and > >>>>> metadata DUP. > >>>>> > >>>>> Btrfs check of PC rootfs was without any errors in both modes, I did > >>>>> them once each before reboot on readonly filesystem with --force > >>>>> flag and then from live usb. Same output without any errors. > >>>>> > >>>>> After reboot kernel refused rw mount rootfs with the same error as > >>>>> during cron balance, ro mount was accepted, error during rw mount: > >>>>> BTRFS: error (device dm-17) in merge_reloc_roots:2465: errno=-117 > >>> > >>>> 117 means EUCLEAN, which could be caused by the newly introduced > >>>> first_key and level check. > >>> > >>>> Please apply this hotfix to fix it. > >>>> btrfs: Only check first key for committed tree blocks > >>>> (Which is included in latest pull request) > >>> > >>>> Also, please consider enable CONFIG_BTRFS_DEBUG to provide extra > >>>> debug info. > >>> > >>>> Thanks, > >>>> Qu > >>> > >>> I tried 4.17-rc2 (as the pull request was pulled) with > >>> CONFIG_BTRFS_DEBUG on LVM snapshot of laptop home partition (/dev/vdb) > >>> in a VM (VM kernel sees only snapshot so no UUID collisions). Dmesg > >>> attached. > >> > >> Thanks for the info and your previous btrfs-image. > >> > >> The image itself shows nothing wrong, so it should be runtime problem. > >> Would you please apply these two debug patches? > >> https://patchwork.kernel.org/patch/10335133/ > >> https://patchwork.kernel.org/patch/10335135/ > >> > >> And the attached diff file? > >> > >> My guess is the parent node is not initialized correctly in this case. > >> > >> Thanks, > >> Qu > > > > Dmesg from kernel with all three patches applied attached. > > > Thanks for the debug info, it really helps a lot! > > It turns out that I'm just a super idiot, a typo in replace_path() > caused this, and it could not be trigger unless we enter it from > relocation recovery. > > Please try the attached patch to see if it solves the problem. > > Thanks, > Qu Glad to help, the patch solved the problem, rw mount is successful and balance finished, no errors or debug output, btrfs check is clean in both modes. [ 2.842718] BTRFS: device label home devid 1 transid 277952 /dev/vdb [ 2.924965] BTRFS: device label root devid 1 transid 84092 /dev/vda2 [ 3.072271] BTRFS info (device vda2): use lzo compression, level 0 [ 3.072897] BTRFS info (device vda2): enabling auto defrag [ 3.073476] BTRFS info (device vda2): using free space tree [ 3.074049] BTRFS info (device vda2): has skinny extents [ 5.411821] BTRFS info (device vda2): using free space tree [ 24.925293] BTRFS info (device vdb): using free space tree [ 24.925324] BTRFS info (device vdb): has skinny extents [ 31.711868] BTRFS info (device vdb): continuing balance [ 31.721658] BTRFS info (device vdb): checking UUID tree [ 31.822920] BTRFS info (device vdb): relocating block group 69889687552flags data [ 33.730399] BTRFS info (device vdb): found 12 extents [ 36.950699] BTRFS info (device vdb): found 12 extents [ 37.030813] BTRFS info (device vdb): relocating block group 67742203904flags metadata|dup [ 37.104174] BTRFS info (device vdb): relocating block group 67708649472 flags system|dup [ 37.189843] BTRFS info (device vdb): found 1 extents
From 4b70eb864192ec5cf54a7e67e2957ddf0e5c0f6f Mon Sep 17 00:00:00 2001 From: Qu Wenruo <wqu@suse.com> Date: Mon, 23 Apr 2018 16:13:55 +0800 Subject: [PATCH] btrfs: Fix wrong first_key parameter in replace_path Commit 581c1760415c ("btrfs: Validate child tree block's level and first key") introduced new @first_key parameter for read_tree_block(), however caller in replace_path() is parasing wrong key to read_tree_block(). It should use parameter @first_key other than @key. Normally it won't expose problem as @key is normally initialzied to the same value of @first_key we expect. However in relocation recovery case, @key can be set to (0, 0, 0), and since no valid key in relocation tree can be (0, 0, 0), it will cause read_tree_block() to return -EUCLEAN and interrupt relocation recovery. Fix it by setting @first_key correctly. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/relocation.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 00b7d3231821..b041b945a7ae 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -1841,7 +1841,7 @@ int replace_path(struct btrfs_trans_handle *trans, old_bytenr = btrfs_node_blockptr(parent, slot); blocksize = fs_info->nodesize; old_ptr_gen = btrfs_node_ptr_generation(parent, slot); - btrfs_node_key_to_cpu(parent, &key, slot); + btrfs_node_key_to_cpu(parent, &first_key, slot); if (level <= max_level) { eb = path->nodes[level]; -- 2.17.0