Message ID | 01fa2ddededefc7f03ca4d6df2cccfdbf550aa26.1645157220.git.naohiro.aota@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: zoned: mark relocation as writing | expand |
Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
On Fri, Feb 18, 2022 at 01:14:19PM +0900, Naohiro Aota wrote: > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -3240,6 +3240,9 @@ int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset) > u64 length; > int ret; > > + /* Assert we called sb_start_write(), not to race with FS freezing */ > + ASSERT(sb_write_started(fs_info->sb)); I see this assertion to fail, it's not on all testing VMs, but has happened a few times already so it's probably some race: [ 2927.013859] BTRFS warning (device vdc): devid 1 uuid 4335c7a6-652c-4389-8ea9-270c00fa9880 is missing [ 2927.017693] BTRFS warning (device vdc): devid 1 uuid 4335c7a6-652c-4389-8ea9-270c00fa9880 is missing [ 2927.022921] BTRFS info (device vdc): bdev /dev/vdd errs: wr 0, rd 0, flush 0, corrupt 6000, gen 0 [ 2927.031780] BTRFS info (device vdc): checking UUID tree [ 2927.045348] BTRFS: error (device vdc: state X) in __btrfs_free_extent:3199: errno=-5 IO failure [ 2927.049729] BTRFS info (device vdc: state EX): forced readonly [ 2927.051787] BTRFS: error (device vdc: state EX) in btrfs_run_delayed_refs:2159: errno=-5 IO failure [ 2927.058758] BTRFS info (device vdc: state EX): balance: resume -dusage=90 -musage=90 -susage=90 [ 2927.062457] assertion failed: sb_write_started(fs_info->sb), in fs/btrfs/volumes.c:3244 [ 2927.066121] ------------[ cut here ]------------ [ 2927.067682] kernel BUG at fs/btrfs/ctree.h:3552! [ 2927.069214] invalid opcode: 0000 [#1] PREEMPT SMP [ 2927.070926] CPU: 2 PID: 22817 Comm: btrfs-balance Not tainted 5.17.0-rc5-default+ #1632 [ 2927.075299] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014 [ 2927.080897] RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs] [ 2927.092652] RSP: 0018:ffffaed9c610fdc0 EFLAGS: 00010246 [ 2927.095227] RAX: 000000000000004b RBX: ffffa13a873db000 RCX: 0000000000000000 [ 2927.096898] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 00000000ffffffff [ 2927.100514] RBP: ffffa13a55324000 R08: 0000000000000003 R09: 0000000000000001 [ 2927.102518] R10: 0000000000000000 R11: 0000000000000001 R12: ffffa13a6922f098 [ 2927.104330] R13: 000000008cfa0000 R14: ffffa13a553262a0 R15: ffffa13a873db000 [ 2927.106025] FS: 0000000000000000(0000) GS:ffffa13abda00000(0000) knlGS:0000000000000000 [ 2927.108652] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2927.110568] CR2: 000055fdf2a94fd0 CR3: 000000005d012005 CR4: 0000000000170ea0 [ 2927.112167] Call Trace: [ 2927.112801] <TASK> [ 2927.113212] btrfs_relocate_chunk.cold+0x42/0x67 [btrfs] [ 2927.114328] __btrfs_balance+0x2ea/0x490 [btrfs] [ 2927.114871] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 131072 csum 0x7e797e3e expected csum 0x8941f998 mirror 2 [ 2927.115469] btrfs_balance+0x4ed/0x7e0 [btrfs] [ 2927.118802] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 139264 csum 0x27df6522 expected csum 0x8941f998 mirror 2 [ 2927.119691] ? btrfs_balance+0x7e0/0x7e0 [btrfs] [ 2927.123158] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 143360 csum 0x9f144c35 expected csum 0x8941f998 mirror 2 [ 2927.123965] balance_kthread+0x37/0x50 [btrfs] [ 2927.127299] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 147456 csum 0x1027ab9a expected csum 0x8941f998 mirror 2 [ 2927.128016] kthread+0xea/0x110 [ 2927.128023] ? kthread_complete_and_exit+0x20/0x20 [ 2927.128027] ret_from_fork+0x1f/0x30 [ 2927.128031] </TASK> [ 2927.128032] Modules linked in: [ 2927.131390] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 155648 csum 0x428b86d5 expected csum 0x8941f998 mirror 2 [ 2927.131400] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 163840 csum 0x8fff7df2 expected csum 0x8941f998 mirror 2 [ 2927.131401] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 159744 csum 0x9893a835 expected csum 0x8941f998 mirror 2 [ 2927.131416] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 180224 csum 0x83d83877 expected csum 0x8941f998 mirror 2 [ 2927.131832] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 524288 csum 0x1a0c8fd4 expected csum 0x8941f998 mirror 2 [ 2927.132128] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 540672 csum 0xcaaf83cc expected csum 0x8941f998 mirror 2 [ 2927.133105] dm_flakey dm_mod btrfs blake2b_generic libcrc32c crc32c_intel xor lzo_compress lzo_decompress raid6_pq zstd_decompress zstd_compress xxhash loop [ 2927.144290] ---[ end trace 0000000000000000 ]--- [ 2927.145080] RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs] [ 2927.147738] RSP: 0018:ffffaed9c610fdc0 EFLAGS: 00010246 [ 2927.148220] RAX: 000000000000004b RBX: ffffa13a873db000 RCX: 0000000000000000 [ 2927.149126] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 00000000ffffffff [ 2927.150057] RBP: ffffa13a55324000 R08: 0000000000000003 R09: 0000000000000001 [ 2927.150676] R10: 0000000000000000 R11: 0000000000000001 R12: ffffa13a6922f098 [ 2927.151297] R13: 000000008cfa0000 R14: ffffa13a553262a0 R15: ffffa13a873db000 [ 2927.152529] FS: 0000000000000000(0000) GS:ffffa13abda00000(0000) knlGS:0000000000000000 [ 2927.153646] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2927.154280] CR2: 000055fdf2a94fd0 CR3: 000000005d012005 CR4: 0000000000170ea0
On Wed, Feb 23, 2022 at 11:31:07AM +0100, David Sterba wrote: > On Fri, Feb 18, 2022 at 01:14:19PM +0900, Naohiro Aota wrote: > > --- a/fs/btrfs/volumes.c > > +++ b/fs/btrfs/volumes.c > > @@ -3240,6 +3240,9 @@ int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset) > > u64 length; > > int ret; > > > > + /* Assert we called sb_start_write(), not to race with FS freezing */ > > + ASSERT(sb_write_started(fs_info->sb)); > > I see this assertion to fail, it's not on all testing VMs, but has > happened a few times already so it's probably some race: > > [ 2927.013859] BTRFS warning (device vdc): devid 1 uuid 4335c7a6-652c-4389-8ea9-270c00fa9880 is missing > [ 2927.017693] BTRFS warning (device vdc): devid 1 uuid 4335c7a6-652c-4389-8ea9-270c00fa9880 is missing > [ 2927.022921] BTRFS info (device vdc): bdev /dev/vdd errs: wr 0, rd 0, flush 0, corrupt 6000, gen 0 > [ 2927.031780] BTRFS info (device vdc): checking UUID tree > [ 2927.045348] BTRFS: error (device vdc: state X) in __btrfs_free_extent:3199: errno=-5 IO failure > [ 2927.049729] BTRFS info (device vdc: state EX): forced readonly > [ 2927.051787] BTRFS: error (device vdc: state EX) in btrfs_run_delayed_refs:2159: errno=-5 IO failure > [ 2927.058758] BTRFS info (device vdc: state EX): balance: resume -dusage=90 -musage=90 -susage=90 > [ 2927.062457] assertion failed: sb_write_started(fs_info->sb), in fs/btrfs/volumes.c:3244 > [ 2927.066121] ------------[ cut here ]------------ > [ 2927.067682] kernel BUG at fs/btrfs/ctree.h:3552! > [ 2927.069214] invalid opcode: 0000 [#1] PREEMPT SMP > [ 2927.070926] CPU: 2 PID: 22817 Comm: btrfs-balance Not tainted 5.17.0-rc5-default+ #1632 > [ 2927.075299] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014 > [ 2927.080897] RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs] > [ 2927.092652] RSP: 0018:ffffaed9c610fdc0 EFLAGS: 00010246 > [ 2927.095227] RAX: 000000000000004b RBX: ffffa13a873db000 RCX: 0000000000000000 > [ 2927.096898] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 00000000ffffffff > [ 2927.100514] RBP: ffffa13a55324000 R08: 0000000000000003 R09: 0000000000000001 > [ 2927.102518] R10: 0000000000000000 R11: 0000000000000001 R12: ffffa13a6922f098 > [ 2927.104330] R13: 000000008cfa0000 R14: ffffa13a553262a0 R15: ffffa13a873db000 > [ 2927.106025] FS: 0000000000000000(0000) GS:ffffa13abda00000(0000) knlGS:0000000000000000 > [ 2927.108652] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 2927.110568] CR2: 000055fdf2a94fd0 CR3: 000000005d012005 CR4: 0000000000170ea0 > [ 2927.112167] Call Trace: > [ 2927.112801] <TASK> > [ 2927.113212] btrfs_relocate_chunk.cold+0x42/0x67 [btrfs] > [ 2927.114328] __btrfs_balance+0x2ea/0x490 [btrfs] > [ 2927.114871] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 131072 csum 0x7e797e3e expected csum 0x8941f998 mirror 2 > [ 2927.115469] btrfs_balance+0x4ed/0x7e0 [btrfs] > [ 2927.118802] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 139264 csum 0x27df6522 expected csum 0x8941f998 mirror 2 > [ 2927.119691] ? btrfs_balance+0x7e0/0x7e0 [btrfs] > [ 2927.123158] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 143360 csum 0x9f144c35 expected csum 0x8941f998 mirror 2 > [ 2927.123965] balance_kthread+0x37/0x50 [btrfs] It looks like this occurs when the balance is resumed. We also need sb_{start,end}_write around btrfs_balance() in balance_kthred(). I guess we can cause a hang if we resume the balance and freeze the FS at the same time. > [ 2927.127299] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 147456 csum 0x1027ab9a expected csum 0x8941f998 mirror 2 > [ 2927.128016] kthread+0xea/0x110 > [ 2927.128023] ? kthread_complete_and_exit+0x20/0x20 > [ 2927.128027] ret_from_fork+0x1f/0x30 > [ 2927.128031] </TASK> > [ 2927.128032] Modules linked in: > [ 2927.131390] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 155648 csum 0x428b86d5 expected csum 0x8941f998 mirror 2 > [ 2927.131400] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 163840 csum 0x8fff7df2 expected csum 0x8941f998 mirror 2 > [ 2927.131401] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 159744 csum 0x9893a835 expected csum 0x8941f998 mirror 2 > [ 2927.131416] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 180224 csum 0x83d83877 expected csum 0x8941f998 mirror 2 > [ 2927.131832] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 524288 csum 0x1a0c8fd4 expected csum 0x8941f998 mirror 2 > [ 2927.132128] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 540672 csum 0xcaaf83cc expected csum 0x8941f998 mirror 2 > [ 2927.133105] dm_flakey dm_mod btrfs blake2b_generic libcrc32c crc32c_intel xor lzo_compress lzo_decompress raid6_pq zstd_decompress zstd_compress xxhash loop > [ 2927.144290] ---[ end trace 0000000000000000 ]--- > [ 2927.145080] RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs] > [ 2927.147738] RSP: 0018:ffffaed9c610fdc0 EFLAGS: 00010246 > [ 2927.148220] RAX: 000000000000004b RBX: ffffa13a873db000 RCX: 0000000000000000 > [ 2927.149126] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 00000000ffffffff > [ 2927.150057] RBP: ffffa13a55324000 R08: 0000000000000003 R09: 0000000000000001 > [ 2927.150676] R10: 0000000000000000 R11: 0000000000000001 R12: ffffa13a6922f098 > [ 2927.151297] R13: 000000008cfa0000 R14: ffffa13a553262a0 R15: ffffa13a873db000 > [ 2927.152529] FS: 0000000000000000(0000) GS:ffffa13abda00000(0000) knlGS:0000000000000000 > [ 2927.153646] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 2927.154280] CR2: 000055fdf2a94fd0 CR3: 000000005d012005 CR4: 0000000000170ea0
On Thu, Feb 24, 2022 at 02:15:58AM +0000, Naohiro Aota wrote: > On Wed, Feb 23, 2022 at 11:31:07AM +0100, David Sterba wrote: > > On Fri, Feb 18, 2022 at 01:14:19PM +0900, Naohiro Aota wrote: > > [ 2927.114871] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 131072 csum 0x7e797e3e expected csum 0x8941f998 mirror 2 > > [ 2927.115469] btrfs_balance+0x4ed/0x7e0 [btrfs] > > [ 2927.118802] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 139264 csum 0x27df6522 expected csum 0x8941f998 mirror 2 > > [ 2927.119691] ? btrfs_balance+0x7e0/0x7e0 [btrfs] > > [ 2927.123158] BTRFS warning (device vdc: state EX): csum failed root 5 ino 258 off 143360 csum 0x9f144c35 expected csum 0x8941f998 mirror 2 > > [ 2927.123965] balance_kthread+0x37/0x50 [btrfs] > > It looks like this occurs when the balance is resumed. We also need > sb_{start,end}_write around btrfs_balance() in balance_kthred(). Sounds plausible. > I guess we can cause a hang if we resume the balance and freeze the FS > at the same time. The background balance starts only when the filesystem is mounted for write, so right after the sb_rdonly check in open_ctree, but I think you're right that freeze during that can lead to a hang.
On Thu, Feb 24, 2022 at 02:15:58AM +0000, Naohiro Aota wrote: > On Wed, Feb 23, 2022 at 11:31:07AM +0100, David Sterba wrote: > > On Fri, Feb 18, 2022 at 01:14:19PM +0900, Naohiro Aota wrote: > It looks like this occurs when the balance is resumed. We also need > sb_{start,end}_write around btrfs_balance() in balance_kthred(). > > I guess we can cause a hang if we resume the balance and freeze the FS > at the same time. We need to fix the missing write protection before the asserts can be added, so I'll delete them from this patch and will submit the helpers patch once after we have fixed all.
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 3113f6d7f335..c22d287e020b 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1522,8 +1522,12 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags)) return; - if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE)) + sb_start_write(fs_info->sb); + + if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE)) { + sb_end_write(fs_info->sb); return; + } /* * Long running balances can keep us blocked here for eternity, so @@ -1531,6 +1535,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) */ if (!mutex_trylock(&fs_info->reclaim_bgs_lock)) { btrfs_exclop_finish(fs_info); + sb_end_write(fs_info->sb); return; } @@ -1605,6 +1610,7 @@ void btrfs_reclaim_bgs_work(struct work_struct *work) spin_unlock(&fs_info->unused_bgs_lock); mutex_unlock(&fs_info->reclaim_bgs_lock); btrfs_exclop_finish(fs_info); + sb_end_write(fs_info->sb); } void btrfs_reclaim_bgs(struct btrfs_fs_info *fs_info) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index fa7fee09e39b..74c8024d8f96 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -3240,6 +3240,9 @@ int btrfs_relocate_chunk(struct btrfs_fs_info *fs_info, u64 chunk_offset) u64 length; int ret; + /* Assert we called sb_start_write(), not to race with FS freezing */ + ASSERT(sb_write_started(fs_info->sb)); + if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) { btrfs_err(fs_info, "relocate: not supported on extent tree v2 yet"); @@ -8304,10 +8307,12 @@ static int relocating_repair_kthread(void *data) target = cache->start; btrfs_put_block_group(cache); + sb_start_write(fs_info->sb); if (!btrfs_exclop_start(fs_info, BTRFS_EXCLOP_BALANCE)) { btrfs_info(fs_info, "zoned: skip relocating block group %llu to repair: EBUSY", target); + sb_end_write(fs_info->sb); return -EBUSY; } @@ -8335,6 +8340,7 @@ static int relocating_repair_kthread(void *data) btrfs_put_block_group(cache); mutex_unlock(&fs_info->reclaim_bgs_lock); btrfs_exclop_finish(fs_info); + sb_end_write(fs_info->sb); return ret; }
There is a hung_task issue with running generic/068 on an SMR device. The hang occurs while a process is trying to thaw the filesystem. The process is trying to take sb->s_umount to thaw the FS. The lock is held by fsstress, which calls btrfs_sync_fs() and is waiting for an ordered extent to finish. However, as the FS is frozen, the ordered extent never finish. Having an ordered extent while the FS is frozen is the root cause of the hang. The ordered extent is initiated from btrfs_relocate_chunk() which is called from btrfs_reclaim_bgs_work(). This commit add sb_*_write() around btrfs_relocate_chunk() call site. For the usual "btrfs balance" command, we already call it with mnt_want_file() in btrfs_ioctl_balance(). Additionally, add an ASSERT in btrfs_relocate_chunk() to check it is properly called. Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones") Cc: stable@vger.kernel.org # 5.13+ Link: https://github.com/naota/linux/issues/56 Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> --- fs/btrfs/block-group.c | 8 +++++++- fs/btrfs/volumes.c | 6 ++++++ 2 files changed, 13 insertions(+), 1 deletion(-)