Message ID | 3f4f2a0ff1a6c818050434288925bdcf3cd719e5.1709124777.git.naohiro.aota@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: zoned: prevent extent buffer header to be cleared | expand |
On Wed, Feb 28, 2024 at 09:53:03PM +0900, Naohiro Aota wrote: > Btrfs clears the content of an extent buffer marked as > EXTENT_BUFFER_ZONED_ZEROOUT before the bio submission. This mechanism is > introduced to prevent a write hole of an extent buffer, which is once > allocated, marked dirty, but turns out unnecessary and cleaned up within > one transaction operation. > > However, btrfs_free_tree_block() can be called on an extent buffer after > its content is cleared. Then, it inserts a faulty delayed reference entry, > which makes the FS corrupted. > > This bug can be triggered running generic/013 several (~200) times. It > failed as following: > > ------------[ cut here ]------------ > WARNING: CPU: 9 PID: 29834 at fs/btrfs/extent-tree.c:3248 __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs] > Modules linked in: dm_flakey algif_hash af_alg xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables br_netfilter bridge stp llc overlay sunrpc kvm_amd kvm irqbypass rapl acpi_cpufreq ipmi_ssif i2c_piix4 k10temp btrfs ipmi_si blake2b_generic ipmi_devintf xor ipmi_msghandler raid6_pq bfq loop dm_mod zram bnxt_en ccp pkcs8_key_parser asn1_decoder public_key oid_registry fuse msr ipv6 > CPU: 9 PID: 29834 Comm: fsstress Not tainted 6.8.0-rc4-BTRFS-ZNS+ #403 > Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021 > RIP: 0010:__btrfs_free_extent.isra.0+0x604/0x1330 [btrfs] > Code: 8b 3f e8 bf 69 00 00 48 8b 7d 60 45 8b 4f 40 49 89 d8 8b 54 24 40 4c 89 e9 48 c7 c6 30 64 65 a0 e8 61 fb 0d 00 e9 8f fd ff ff <0f> 0b f0 48 0f ba 28 02 41 b8 00 00 00 00 0f 83 86 04 00 00 b9 8b > RSP: 0018:ffffc900090cfb80 EFLAGS: 00010246 > RAX: ffff888365c719d8 RBX: 0000000f9677c000 RCX: 0000000000000001 > RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000 > RBP: ffff8889a044b220 R08: 0000000000000000 R09: 0000000000000004 > R10: 0000000000000000 R11: 00000000ffffffff R12: 0000000000000001 > R13: ffff888ad87a4c98 R14: 0000000000000005 R15: ffff888a0c7d2a80 > FS: 00007f823f5f7740(0000) GS:ffff889fcea40000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000560ce0610b38 CR3: 0000000a907ec000 CR4: 0000000000350ef0 > Call Trace: > <TASK> > ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs] > ? __warn+0x81/0x170 > ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs] > ? report_bug+0x18d/0x1c0 > ? handle_bug+0x3c/0x70 > ? exc_invalid_op+0x13/0x60 > ? asm_exc_invalid_op+0x16/0x20 > ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs] > ? srso_return_thunk+0x5/0x5f > ? rcu_is_watching+0xd/0x40 > ? srso_return_thunk+0x5/0x5f > ? lock_release+0x1e5/0x280 > __btrfs_run_delayed_refs+0x64c/0x1380 [btrfs] > ? btrfs_commit_transaction+0x3e/0x12d0 [btrfs] > btrfs_run_delayed_refs+0x92/0x130 [btrfs] > btrfs_commit_transaction+0xa2/0x12d0 [btrfs] > ? srso_return_thunk+0x5/0x5f > ? srso_return_thunk+0x5/0x5f > ? rcu_is_watching+0xd/0x40 > ? srso_return_thunk+0x5/0x5f > ? lock_release+0x1e5/0x280 > btrfs_sync_file+0x532/0x660 [btrfs] > __x64_sys_fsync+0x37/0x60 > do_syscall_64+0x79/0x1a0 > entry_SYSCALL_64_after_hwframe+0x6e/0x76 > RIP: 0033:0x7f823f6f8400 > Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d e1 d1 0d 00 00 74 17 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c > RSP: 002b:00007ffe3c26e9f8 EFLAGS: 00000202 ORIG_RAX: 000000000000004a > RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f823f6f8400 > RDX: 0000000000000193 RSI: 0000560cdfcdb048 RDI: 0000000000000003 > RBP: 00000000000002e6 R08: 0000000000000007 R09: 00007ffe3c26ea0c > R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffe3c26ea10 > R13: 028f5c28f5c28f5c R14: 8f5c28f5c28f5c29 R15: 0000560cdfcd7180 > </TASK> > irq event stamp: 0 > hardirqs last enabled at (0): [<0000000000000000>] 0x0 > hardirqs last disabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00 > softirqs last enabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00 > softirqs last disabled at (0): [<0000000000000000>] 0x0 > ---[ end trace 0000000000000000 ]--- > ------------[ cut here ]------------ > BTRFS: Transaction aborted (error -117) > WARNING: CPU: 9 PID: 29834 at fs/btrfs/extent-tree.c:3249 __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs] > Modules linked in: dm_flakey algif_hash af_alg xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables br_netfilter bridge stp llc overlay sunrpc kvm_amd kvm irqbypass rapl acpi_cpufreq ipmi_ssif i2c_piix4 k10temp btrfs ipmi_si blake2b_generic ipmi_devintf xor ipmi_msghandler raid6_pq bfq loop dm_mod zram bnxt_en ccp pkcs8_key_parser asn1_decoder public_key oid_registry fuse msr ipv6 > CPU: 9 PID: 29834 Comm: fsstress Tainted: G W 6.8.0-rc4-BTRFS-ZNS+ #403 > Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021 > RIP: 0010:__btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs] > Code: 48 c7 c6 40 5d 65 a0 e8 f0 f1 0d 00 c7 44 24 18 01 00 00 00 e9 ed f7 ff ff be 8b ff ff ff 48 c7 c7 68 5d 65 a0 e8 52 69 c1 e0 <0f> 0b e9 30 fb ff ff 48 8b 45 60 48 05 d8 19 00 00 f0 48 0f ba 28 > RSP: 0018:ffffc900090cfb80 EFLAGS: 00010282 > RAX: 0000000000000000 RBX: 0000000f9677c000 RCX: 0000000000000000 > RDX: 0000000000000002 RSI: ffffffff82464302 RDI: 00000000ffffffff > RBP: ffff8889a044b220 R08: 0000000000009ffb R09: 00000000ffffdfff > R10: 00000000ffffdfff R11: ffffffff8264dd80 R12: 0000000000000001 > R13: ffff888ad87a4c98 R14: 0000000000000005 R15: ffff888a0c7d2a80 > FS: 00007f823f5f7740(0000) GS:ffff889fcea40000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000560ce0610b38 CR3: 0000000a907ec000 CR4: 0000000000350ef0 > Call Trace: > <TASK> > ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs] > ? __warn+0x81/0x170 > ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs] > ? report_bug+0x18d/0x1c0 > ? tick_nohz_tick_stopped+0x12/0x30 > ? srso_return_thunk+0x5/0x5f > ? handle_bug+0x3c/0x70 > ? exc_invalid_op+0x13/0x60 > ? asm_exc_invalid_op+0x16/0x20 > ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs] > ? srso_return_thunk+0x5/0x5f > ? rcu_is_watching+0xd/0x40 > ? srso_return_thunk+0x5/0x5f > ? lock_release+0x1e5/0x280 > __btrfs_run_delayed_refs+0x64c/0x1380 [btrfs] > ? btrfs_commit_transaction+0x3e/0x12d0 [btrfs] > btrfs_run_delayed_refs+0x92/0x130 [btrfs] > btrfs_commit_transaction+0xa2/0x12d0 [btrfs] > ? srso_return_thunk+0x5/0x5f > ? srso_return_thunk+0x5/0x5f > ? rcu_is_watching+0xd/0x40 > ? srso_return_thunk+0x5/0x5f > ? lock_release+0x1e5/0x280 > btrfs_sync_file+0x532/0x660 [btrfs] > __x64_sys_fsync+0x37/0x60 > do_syscall_64+0x79/0x1a0 > entry_SYSCALL_64_after_hwframe+0x6e/0x76 > RIP: 0033:0x7f823f6f8400 > Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d e1 d1 0d 00 00 74 17 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c > RSP: 002b:00007ffe3c26e9f8 EFLAGS: 00000202 ORIG_RAX: 000000000000004a > RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f823f6f8400 > RDX: 0000000000000193 RSI: 0000560cdfcdb048 RDI: 0000000000000003 > RBP: 00000000000002e6 R08: 0000000000000007 R09: 00007ffe3c26ea0c > R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffe3c26ea10 > R13: 028f5c28f5c28f5c R14: 8f5c28f5c28f5c29 R15: 0000560cdfcd7180 > </TASK> > irq event stamp: 0 > hardirqs last enabled at (0): [<0000000000000000>] 0x0 > hardirqs last disabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00 > softirqs last enabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00 > softirqs last disabled at (0): [<0000000000000000>] 0x0 > ---[ end trace 0000000000000000 ]--- > BTRFS: error (device nvme1n2: state A) in __btrfs_free_extent:3249: errno=-117 Filesystem corrupted > BTRFS info (device nvme1n2: state EA): forced readonly > BTRFS info (device nvme1n2: state EA): leaf 66957836288 gen 3873 total ptrs 203 free space 1102 owner 2 > BTRFS info (device nvme1n2: state EA): refs 2 lock_owner 29834 current 29834 > item 0 key (63394947072 168 40960) itemoff 16230 itemsize 53 > extent refs 1 gen 3835 flags 1 > ref#0: extent data backref root 5 objectid 552 offset 1802240 count 1 > (snip)... > item 164 key (66948923392 169 0) itemoff 8229 itemsize 33 > extent refs 1 gen 3872 flags 2 > ref#0: tree block backref root 2 > item 165 key (66948939776 169 1) itemoff 8196 itemsize 33 > extent refs 1 gen 3873 flags 2 > ref#0: tree block backref root 5 > item 166 key (68719476736 168 110592) itemoff 8143 itemsize 53 > extent refs 1 gen 3841 flags 1 > ref#0: extent data backref root 5 objectid 440 offset 3100672 count 1 > (snip)... > item 202 key (68722249728 168 110592) itemoff 6177 itemsize 53 > extent refs 1 gen 3842 flags 1 > ref#0: extent data backref root 5 objectid 953 offset 5431296 count 1 > BTRFS critical (device nvme1n2: state EA): unable to find ref byte nr 66948939776 parent 66948939776 root 5 owner 0 offset 0 slot 166 > BTRFS error (device nvme1n2: state EA): failed to run delayed ref for logical 66948939776 num_bytes 16384 type 182 action 2 ref_mod 1: -2 > BTRFS: error (device nvme1n2: state EA) in btrfs_run_delayed_refs:2246: errno=-2 No such entry > BTRFS warning (device nvme1n2: state EA): Skipping commit of aborted transaction. > BTRFS: error (device nvme1n2: state EA) in cleanup_transaction:2006: errno=-2 No such entry > > This happens maybe because clearing the contents is too early. It should > clear the content after all the reference to the node is dropped. If you have such suspicion you can add assertions to validate the state, bits and other constraints. > Addressing that root cause needs more time. Until that, leave the extent > buffer header intact, so that we can add a proper delayed reference entry. > > Fixes: aa6313e6ff2b ("btrfs: zoned: don't clear dirty flag of extent buffer") > Link: https://lore.kernel.org/linux-btrfs/oadvdekkturysgfgi4qzuemd57zudeasynswurjxw3ocdfsef6@sjyufeugh63f/ > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> > --- > fs/btrfs/disk-io.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index a2e45ed6ef14..8aaed8719394 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -278,7 +278,9 @@ blk_status_t btree_csum_one_bio(struct btrfs_bio *bbio) > * ordering of I/O without unnecessarily writing out data. > */ > if (test_bit(EXTENT_BUFFER_ZONED_ZEROOUT, &eb->bflags)) { > - memzero_extent_buffer(eb, 0, eb->len); > + const unsigned long header_size = sizeof(struct btrfs_header); > + > + memzero_extent_buffer(eb, header_size, eb->len - header_size); So this means anything that finds the buffer will have to rely on the state in the header and if it's with ZONED_ZEROOUT then stop processing it. btree_csum_one_bio() is the only function that checks the bit AFAICS. Otherwise, if this is a sufficient fix we'd need it for 6.8.
On Wed, Feb 28, 2024 at 02:22:49PM +0100, David Sterba wrote: > On Wed, Feb 28, 2024 at 09:53:03PM +0900, Naohiro Aota wrote: > > Btrfs clears the content of an extent buffer marked as > > EXTENT_BUFFER_ZONED_ZEROOUT before the bio submission. This mechanism is > > introduced to prevent a write hole of an extent buffer, which is once > > allocated, marked dirty, but turns out unnecessary and cleaned up within > > one transaction operation. > > > > However, btrfs_free_tree_block() can be called on an extent buffer after > > its content is cleared. Then, it inserts a faulty delayed reference entry, > > which makes the FS corrupted. > > > > This bug can be triggered running generic/013 several (~200) times. It > > failed as following: > > > > ------------[ cut here ]------------ > > WARNING: CPU: 9 PID: 29834 at fs/btrfs/extent-tree.c:3248 __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs] > > Modules linked in: dm_flakey algif_hash af_alg xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables br_netfilter bridge stp llc overlay sunrpc kvm_amd kvm irqbypass rapl acpi_cpufreq ipmi_ssif i2c_piix4 k10temp btrfs ipmi_si blake2b_generic ipmi_devintf xor ipmi_msghandler raid6_pq bfq loop dm_mod zram bnxt_en ccp pkcs8_key_parser asn1_decoder public_key oid_registry fuse msr ipv6 > > CPU: 9 PID: 29834 Comm: fsstress Not tainted 6.8.0-rc4-BTRFS-ZNS+ #403 > > Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021 > > RIP: 0010:__btrfs_free_extent.isra.0+0x604/0x1330 [btrfs] > > Code: 8b 3f e8 bf 69 00 00 48 8b 7d 60 45 8b 4f 40 49 89 d8 8b 54 24 40 4c 89 e9 48 c7 c6 30 64 65 a0 e8 61 fb 0d 00 e9 8f fd ff ff <0f> 0b f0 48 0f ba 28 02 41 b8 00 00 00 00 0f 83 86 04 00 00 b9 8b > > RSP: 0018:ffffc900090cfb80 EFLAGS: 00010246 > > RAX: ffff888365c719d8 RBX: 0000000f9677c000 RCX: 0000000000000001 > > RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000 > > RBP: ffff8889a044b220 R08: 0000000000000000 R09: 0000000000000004 > > R10: 0000000000000000 R11: 00000000ffffffff R12: 0000000000000001 > > R13: ffff888ad87a4c98 R14: 0000000000000005 R15: ffff888a0c7d2a80 > > FS: 00007f823f5f7740(0000) GS:ffff889fcea40000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000560ce0610b38 CR3: 0000000a907ec000 CR4: 0000000000350ef0 > > Call Trace: > > <TASK> > > ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs] > > ? __warn+0x81/0x170 > > ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs] > > ? report_bug+0x18d/0x1c0 > > ? handle_bug+0x3c/0x70 > > ? exc_invalid_op+0x13/0x60 > > ? asm_exc_invalid_op+0x16/0x20 > > ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs] > > ? srso_return_thunk+0x5/0x5f > > ? rcu_is_watching+0xd/0x40 > > ? srso_return_thunk+0x5/0x5f > > ? lock_release+0x1e5/0x280 > > __btrfs_run_delayed_refs+0x64c/0x1380 [btrfs] > > ? btrfs_commit_transaction+0x3e/0x12d0 [btrfs] > > btrfs_run_delayed_refs+0x92/0x130 [btrfs] > > btrfs_commit_transaction+0xa2/0x12d0 [btrfs] > > ? srso_return_thunk+0x5/0x5f > > ? srso_return_thunk+0x5/0x5f > > ? rcu_is_watching+0xd/0x40 > > ? srso_return_thunk+0x5/0x5f > > ? lock_release+0x1e5/0x280 > > btrfs_sync_file+0x532/0x660 [btrfs] > > __x64_sys_fsync+0x37/0x60 > > do_syscall_64+0x79/0x1a0 > > entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > RIP: 0033:0x7f823f6f8400 > > Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d e1 d1 0d 00 00 74 17 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c > > RSP: 002b:00007ffe3c26e9f8 EFLAGS: 00000202 ORIG_RAX: 000000000000004a > > RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f823f6f8400 > > RDX: 0000000000000193 RSI: 0000560cdfcdb048 RDI: 0000000000000003 > > RBP: 00000000000002e6 R08: 0000000000000007 R09: 00007ffe3c26ea0c > > R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffe3c26ea10 > > R13: 028f5c28f5c28f5c R14: 8f5c28f5c28f5c29 R15: 0000560cdfcd7180 > > </TASK> > > irq event stamp: 0 > > hardirqs last enabled at (0): [<0000000000000000>] 0x0 > > hardirqs last disabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00 > > softirqs last enabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00 > > softirqs last disabled at (0): [<0000000000000000>] 0x0 > > ---[ end trace 0000000000000000 ]--- > > ------------[ cut here ]------------ > > BTRFS: Transaction aborted (error -117) > > WARNING: CPU: 9 PID: 29834 at fs/btrfs/extent-tree.c:3249 __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs] > > Modules linked in: dm_flakey algif_hash af_alg xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables br_netfilter bridge stp llc overlay sunrpc kvm_amd kvm irqbypass rapl acpi_cpufreq ipmi_ssif i2c_piix4 k10temp btrfs ipmi_si blake2b_generic ipmi_devintf xor ipmi_msghandler raid6_pq bfq loop dm_mod zram bnxt_en ccp pkcs8_key_parser asn1_decoder public_key oid_registry fuse msr ipv6 > > CPU: 9 PID: 29834 Comm: fsstress Tainted: G W 6.8.0-rc4-BTRFS-ZNS+ #403 > > Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021 > > RIP: 0010:__btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs] > > Code: 48 c7 c6 40 5d 65 a0 e8 f0 f1 0d 00 c7 44 24 18 01 00 00 00 e9 ed f7 ff ff be 8b ff ff ff 48 c7 c7 68 5d 65 a0 e8 52 69 c1 e0 <0f> 0b e9 30 fb ff ff 48 8b 45 60 48 05 d8 19 00 00 f0 48 0f ba 28 > > RSP: 0018:ffffc900090cfb80 EFLAGS: 00010282 > > RAX: 0000000000000000 RBX: 0000000f9677c000 RCX: 0000000000000000 > > RDX: 0000000000000002 RSI: ffffffff82464302 RDI: 00000000ffffffff > > RBP: ffff8889a044b220 R08: 0000000000009ffb R09: 00000000ffffdfff > > R10: 00000000ffffdfff R11: ffffffff8264dd80 R12: 0000000000000001 > > R13: ffff888ad87a4c98 R14: 0000000000000005 R15: ffff888a0c7d2a80 > > FS: 00007f823f5f7740(0000) GS:ffff889fcea40000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000560ce0610b38 CR3: 0000000a907ec000 CR4: 0000000000350ef0 > > Call Trace: > > <TASK> > > ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs] > > ? __warn+0x81/0x170 > > ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs] > > ? report_bug+0x18d/0x1c0 > > ? tick_nohz_tick_stopped+0x12/0x30 > > ? srso_return_thunk+0x5/0x5f > > ? handle_bug+0x3c/0x70 > > ? exc_invalid_op+0x13/0x60 > > ? asm_exc_invalid_op+0x16/0x20 > > ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs] > > ? srso_return_thunk+0x5/0x5f > > ? rcu_is_watching+0xd/0x40 > > ? srso_return_thunk+0x5/0x5f > > ? lock_release+0x1e5/0x280 > > __btrfs_run_delayed_refs+0x64c/0x1380 [btrfs] > > ? btrfs_commit_transaction+0x3e/0x12d0 [btrfs] > > btrfs_run_delayed_refs+0x92/0x130 [btrfs] > > btrfs_commit_transaction+0xa2/0x12d0 [btrfs] > > ? srso_return_thunk+0x5/0x5f > > ? srso_return_thunk+0x5/0x5f > > ? rcu_is_watching+0xd/0x40 > > ? srso_return_thunk+0x5/0x5f > > ? lock_release+0x1e5/0x280 > > btrfs_sync_file+0x532/0x660 [btrfs] > > __x64_sys_fsync+0x37/0x60 > > do_syscall_64+0x79/0x1a0 > > entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > RIP: 0033:0x7f823f6f8400 > > Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d e1 d1 0d 00 00 74 17 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c > > RSP: 002b:00007ffe3c26e9f8 EFLAGS: 00000202 ORIG_RAX: 000000000000004a > > RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f823f6f8400 > > RDX: 0000000000000193 RSI: 0000560cdfcdb048 RDI: 0000000000000003 > > RBP: 00000000000002e6 R08: 0000000000000007 R09: 00007ffe3c26ea0c > > R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffe3c26ea10 > > R13: 028f5c28f5c28f5c R14: 8f5c28f5c28f5c29 R15: 0000560cdfcd7180 > > </TASK> > > irq event stamp: 0 > > hardirqs last enabled at (0): [<0000000000000000>] 0x0 > > hardirqs last disabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00 > > softirqs last enabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00 > > softirqs last disabled at (0): [<0000000000000000>] 0x0 > > ---[ end trace 0000000000000000 ]--- > > BTRFS: error (device nvme1n2: state A) in __btrfs_free_extent:3249: errno=-117 Filesystem corrupted > > BTRFS info (device nvme1n2: state EA): forced readonly > > BTRFS info (device nvme1n2: state EA): leaf 66957836288 gen 3873 total ptrs 203 free space 1102 owner 2 > > BTRFS info (device nvme1n2: state EA): refs 2 lock_owner 29834 current 29834 > > item 0 key (63394947072 168 40960) itemoff 16230 itemsize 53 > > extent refs 1 gen 3835 flags 1 > > ref#0: extent data backref root 5 objectid 552 offset 1802240 count 1 > > (snip)... > > item 164 key (66948923392 169 0) itemoff 8229 itemsize 33 > > extent refs 1 gen 3872 flags 2 > > ref#0: tree block backref root 2 > > item 165 key (66948939776 169 1) itemoff 8196 itemsize 33 > > extent refs 1 gen 3873 flags 2 > > ref#0: tree block backref root 5 > > item 166 key (68719476736 168 110592) itemoff 8143 itemsize 53 > > extent refs 1 gen 3841 flags 1 > > ref#0: extent data backref root 5 objectid 440 offset 3100672 count 1 > > (snip)... > > item 202 key (68722249728 168 110592) itemoff 6177 itemsize 53 > > extent refs 1 gen 3842 flags 1 > > ref#0: extent data backref root 5 objectid 953 offset 5431296 count 1 > > BTRFS critical (device nvme1n2: state EA): unable to find ref byte nr 66948939776 parent 66948939776 root 5 owner 0 offset 0 slot 166 > > BTRFS error (device nvme1n2: state EA): failed to run delayed ref for logical 66948939776 num_bytes 16384 type 182 action 2 ref_mod 1: -2 > > BTRFS: error (device nvme1n2: state EA) in btrfs_run_delayed_refs:2246: errno=-2 No such entry > > BTRFS warning (device nvme1n2: state EA): Skipping commit of aborted transaction. > > BTRFS: error (device nvme1n2: state EA) in cleanup_transaction:2006: errno=-2 No such entry > > > > This happens maybe because clearing the contents is too early. It should > > clear the content after all the reference to the node is dropped. > > If you have such suspicion you can add assertions to validate the state, > bits and other constraints. It is difficult to assert the node-to-node reference because we need to query the tree. But, I'm going to add other assertion to check if the delayed ref is not faulty. > > > Addressing that root cause needs more time. Until that, leave the extent > > buffer header intact, so that we can add a proper delayed reference entry. > > > > Fixes: aa6313e6ff2b ("btrfs: zoned: don't clear dirty flag of extent buffer") > > Link: https://lore.kernel.org/linux-btrfs/oadvdekkturysgfgi4qzuemd57zudeasynswurjxw3ocdfsef6@sjyufeugh63f/ > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> > > --- > > fs/btrfs/disk-io.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > > index a2e45ed6ef14..8aaed8719394 100644 > > --- a/fs/btrfs/disk-io.c > > +++ b/fs/btrfs/disk-io.c > > @@ -278,7 +278,9 @@ blk_status_t btree_csum_one_bio(struct btrfs_bio *bbio) > > * ordering of I/O without unnecessarily writing out data. > > */ > > if (test_bit(EXTENT_BUFFER_ZONED_ZEROOUT, &eb->bflags)) { > > - memzero_extent_buffer(eb, 0, eb->len); > > + const unsigned long header_size = sizeof(struct btrfs_header); > > + > > + memzero_extent_buffer(eb, header_size, eb->len - header_size); > > So this means anything that finds the buffer will have to rely on the > state in the header and if it's with ZONED_ZEROOUT then stop processing > it. btree_csum_one_bio() is the only function that checks the bit > AFAICS. Yes, indeed it's cumbersome to check the flag for every btrfs_header_* usage. So, I found another solution and I'm testing it. It looks like adding EXTENT_BUFFER_ZONED_ZEROOUT even when an extent buffer is not DIRTY is causing the bug. > Otherwise, if this is a sufficient fix we'd need it for 6.8.
On Tue, Mar 05, 2024 at 06:57:48AM +0000, Naohiro Aota wrote: > On Wed, Feb 28, 2024 at 02:22:49PM +0100, David Sterba wrote: > > On Wed, Feb 28, 2024 at 09:53:03PM +0900, Naohiro Aota wrote: > > > Btrfs clears the content of an extent buffer marked as > > > EXTENT_BUFFER_ZONED_ZEROOUT before the bio submission. This mechanism is > > > introduced to prevent a write hole of an extent buffer, which is once > > > allocated, marked dirty, but turns out unnecessary and cleaned up within > > > one transaction operation. > > > > > > However, btrfs_free_tree_block() can be called on an extent buffer after > > > its content is cleared. Then, it inserts a faulty delayed reference entry, > > > which makes the FS corrupted. > > > > > > This bug can be triggered running generic/013 several (~200) times. It > > > failed as following: > > > > > > ------------[ cut here ]------------ > > > WARNING: CPU: 9 PID: 29834 at fs/btrfs/extent-tree.c:3248 __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs] > > > Modules linked in: dm_flakey algif_hash af_alg xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables br_netfilter bridge stp llc overlay sunrpc kvm_amd kvm irqbypass rapl acpi_cpufreq ipmi_ssif i2c_piix4 k10temp btrfs ipmi_si blake2b_generic ipmi_devintf xor ipmi_msghandler raid6_pq bfq loop dm_mod zram bnxt_en ccp pkcs8_key_parser asn1_decoder public_key oid_registry fuse msr ipv6 > > > CPU: 9 PID: 29834 Comm: fsstress Not tainted 6.8.0-rc4-BTRFS-ZNS+ #403 > > > Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021 > > > RIP: 0010:__btrfs_free_extent.isra.0+0x604/0x1330 [btrfs] > > > Code: 8b 3f e8 bf 69 00 00 48 8b 7d 60 45 8b 4f 40 49 89 d8 8b 54 24 40 4c 89 e9 48 c7 c6 30 64 65 a0 e8 61 fb 0d 00 e9 8f fd ff ff <0f> 0b f0 48 0f ba 28 02 41 b8 00 00 00 00 0f 83 86 04 00 00 b9 8b > > > RSP: 0018:ffffc900090cfb80 EFLAGS: 00010246 > > > RAX: ffff888365c719d8 RBX: 0000000f9677c000 RCX: 0000000000000001 > > > RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000 > > > RBP: ffff8889a044b220 R08: 0000000000000000 R09: 0000000000000004 > > > R10: 0000000000000000 R11: 00000000ffffffff R12: 0000000000000001 > > > R13: ffff888ad87a4c98 R14: 0000000000000005 R15: ffff888a0c7d2a80 > > > FS: 00007f823f5f7740(0000) GS:ffff889fcea40000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 0000560ce0610b38 CR3: 0000000a907ec000 CR4: 0000000000350ef0 > > > Call Trace: > > > <TASK> > > > ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs] > > > ? __warn+0x81/0x170 > > > ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs] > > > ? report_bug+0x18d/0x1c0 > > > ? handle_bug+0x3c/0x70 > > > ? exc_invalid_op+0x13/0x60 > > > ? asm_exc_invalid_op+0x16/0x20 > > > ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs] > > > ? srso_return_thunk+0x5/0x5f > > > ? rcu_is_watching+0xd/0x40 > > > ? srso_return_thunk+0x5/0x5f > > > ? lock_release+0x1e5/0x280 > > > __btrfs_run_delayed_refs+0x64c/0x1380 [btrfs] > > > ? btrfs_commit_transaction+0x3e/0x12d0 [btrfs] > > > btrfs_run_delayed_refs+0x92/0x130 [btrfs] > > > btrfs_commit_transaction+0xa2/0x12d0 [btrfs] > > > ? srso_return_thunk+0x5/0x5f > > > ? srso_return_thunk+0x5/0x5f > > > ? rcu_is_watching+0xd/0x40 > > > ? srso_return_thunk+0x5/0x5f > > > ? lock_release+0x1e5/0x280 > > > btrfs_sync_file+0x532/0x660 [btrfs] > > > __x64_sys_fsync+0x37/0x60 > > > do_syscall_64+0x79/0x1a0 > > > entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > > RIP: 0033:0x7f823f6f8400 > > > Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d e1 d1 0d 00 00 74 17 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c > > > RSP: 002b:00007ffe3c26e9f8 EFLAGS: 00000202 ORIG_RAX: 000000000000004a > > > RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f823f6f8400 > > > RDX: 0000000000000193 RSI: 0000560cdfcdb048 RDI: 0000000000000003 > > > RBP: 00000000000002e6 R08: 0000000000000007 R09: 00007ffe3c26ea0c > > > R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffe3c26ea10 > > > R13: 028f5c28f5c28f5c R14: 8f5c28f5c28f5c29 R15: 0000560cdfcd7180 > > > </TASK> > > > irq event stamp: 0 > > > hardirqs last enabled at (0): [<0000000000000000>] 0x0 > > > hardirqs last disabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00 > > > softirqs last enabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00 > > > softirqs last disabled at (0): [<0000000000000000>] 0x0 > > > ---[ end trace 0000000000000000 ]--- > > > ------------[ cut here ]------------ > > > BTRFS: Transaction aborted (error -117) > > > WARNING: CPU: 9 PID: 29834 at fs/btrfs/extent-tree.c:3249 __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs] > > > Modules linked in: dm_flakey algif_hash af_alg xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables br_netfilter bridge stp llc overlay sunrpc kvm_amd kvm irqbypass rapl acpi_cpufreq ipmi_ssif i2c_piix4 k10temp btrfs ipmi_si blake2b_generic ipmi_devintf xor ipmi_msghandler raid6_pq bfq loop dm_mod zram bnxt_en ccp pkcs8_key_parser asn1_decoder public_key oid_registry fuse msr ipv6 > > > CPU: 9 PID: 29834 Comm: fsstress Tainted: G W 6.8.0-rc4-BTRFS-ZNS+ #403 > > > Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021 > > > RIP: 0010:__btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs] > > > Code: 48 c7 c6 40 5d 65 a0 e8 f0 f1 0d 00 c7 44 24 18 01 00 00 00 e9 ed f7 ff ff be 8b ff ff ff 48 c7 c7 68 5d 65 a0 e8 52 69 c1 e0 <0f> 0b e9 30 fb ff ff 48 8b 45 60 48 05 d8 19 00 00 f0 48 0f ba 28 > > > RSP: 0018:ffffc900090cfb80 EFLAGS: 00010282 > > > RAX: 0000000000000000 RBX: 0000000f9677c000 RCX: 0000000000000000 > > > RDX: 0000000000000002 RSI: ffffffff82464302 RDI: 00000000ffffffff > > > RBP: ffff8889a044b220 R08: 0000000000009ffb R09: 00000000ffffdfff > > > R10: 00000000ffffdfff R11: ffffffff8264dd80 R12: 0000000000000001 > > > R13: ffff888ad87a4c98 R14: 0000000000000005 R15: ffff888a0c7d2a80 > > > FS: 00007f823f5f7740(0000) GS:ffff889fcea40000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 0000560ce0610b38 CR3: 0000000a907ec000 CR4: 0000000000350ef0 > > > Call Trace: > > > <TASK> > > > ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs] > > > ? __warn+0x81/0x170 > > > ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs] > > > ? report_bug+0x18d/0x1c0 > > > ? tick_nohz_tick_stopped+0x12/0x30 > > > ? srso_return_thunk+0x5/0x5f > > > ? handle_bug+0x3c/0x70 > > > ? exc_invalid_op+0x13/0x60 > > > ? asm_exc_invalid_op+0x16/0x20 > > > ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs] > > > ? srso_return_thunk+0x5/0x5f > > > ? rcu_is_watching+0xd/0x40 > > > ? srso_return_thunk+0x5/0x5f > > > ? lock_release+0x1e5/0x280 > > > __btrfs_run_delayed_refs+0x64c/0x1380 [btrfs] > > > ? btrfs_commit_transaction+0x3e/0x12d0 [btrfs] > > > btrfs_run_delayed_refs+0x92/0x130 [btrfs] > > > btrfs_commit_transaction+0xa2/0x12d0 [btrfs] > > > ? srso_return_thunk+0x5/0x5f > > > ? srso_return_thunk+0x5/0x5f > > > ? rcu_is_watching+0xd/0x40 > > > ? srso_return_thunk+0x5/0x5f > > > ? lock_release+0x1e5/0x280 > > > btrfs_sync_file+0x532/0x660 [btrfs] > > > __x64_sys_fsync+0x37/0x60 > > > do_syscall_64+0x79/0x1a0 > > > entry_SYSCALL_64_after_hwframe+0x6e/0x76 > > > RIP: 0033:0x7f823f6f8400 > > > Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d e1 d1 0d 00 00 74 17 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c > > > RSP: 002b:00007ffe3c26e9f8 EFLAGS: 00000202 ORIG_RAX: 000000000000004a > > > RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f823f6f8400 > > > RDX: 0000000000000193 RSI: 0000560cdfcdb048 RDI: 0000000000000003 > > > RBP: 00000000000002e6 R08: 0000000000000007 R09: 00007ffe3c26ea0c > > > R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffe3c26ea10 > > > R13: 028f5c28f5c28f5c R14: 8f5c28f5c28f5c29 R15: 0000560cdfcd7180 > > > </TASK> > > > irq event stamp: 0 > > > hardirqs last enabled at (0): [<0000000000000000>] 0x0 > > > hardirqs last disabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00 > > > softirqs last enabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00 > > > softirqs last disabled at (0): [<0000000000000000>] 0x0 > > > ---[ end trace 0000000000000000 ]--- > > > BTRFS: error (device nvme1n2: state A) in __btrfs_free_extent:3249: errno=-117 Filesystem corrupted > > > BTRFS info (device nvme1n2: state EA): forced readonly > > > BTRFS info (device nvme1n2: state EA): leaf 66957836288 gen 3873 total ptrs 203 free space 1102 owner 2 > > > BTRFS info (device nvme1n2: state EA): refs 2 lock_owner 29834 current 29834 > > > item 0 key (63394947072 168 40960) itemoff 16230 itemsize 53 > > > extent refs 1 gen 3835 flags 1 > > > ref#0: extent data backref root 5 objectid 552 offset 1802240 count 1 > > > (snip)... > > > item 164 key (66948923392 169 0) itemoff 8229 itemsize 33 > > > extent refs 1 gen 3872 flags 2 > > > ref#0: tree block backref root 2 > > > item 165 key (66948939776 169 1) itemoff 8196 itemsize 33 > > > extent refs 1 gen 3873 flags 2 > > > ref#0: tree block backref root 5 > > > item 166 key (68719476736 168 110592) itemoff 8143 itemsize 53 > > > extent refs 1 gen 3841 flags 1 > > > ref#0: extent data backref root 5 objectid 440 offset 3100672 count 1 > > > (snip)... > > > item 202 key (68722249728 168 110592) itemoff 6177 itemsize 53 > > > extent refs 1 gen 3842 flags 1 > > > ref#0: extent data backref root 5 objectid 953 offset 5431296 count 1 > > > BTRFS critical (device nvme1n2: state EA): unable to find ref byte nr 66948939776 parent 66948939776 root 5 owner 0 offset 0 slot 166 > > > BTRFS error (device nvme1n2: state EA): failed to run delayed ref for logical 66948939776 num_bytes 16384 type 182 action 2 ref_mod 1: -2 > > > BTRFS: error (device nvme1n2: state EA) in btrfs_run_delayed_refs:2246: errno=-2 No such entry > > > BTRFS warning (device nvme1n2: state EA): Skipping commit of aborted transaction. > > > BTRFS: error (device nvme1n2: state EA) in cleanup_transaction:2006: errno=-2 No such entry > > > > > > This happens maybe because clearing the contents is too early. It should > > > clear the content after all the reference to the node is dropped. > > > > If you have such suspicion you can add assertions to validate the state, > > bits and other constraints. > > It is difficult to assert the node-to-node reference because we need to > query the tree. But, I'm going to add other assertion to check if the > delayed ref is not faulty. > I added the following assert (currently, it's WARN to make my debug session easy) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index beedd6ed64d3..eb0918df047e 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3464,6 +3464,14 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, if (root_id != BTRFS_TREE_LOG_OBJECTID) { struct btrfs_ref generic_ref = { 0 }; + /* + * Assert that the extent buffer is not cleared due to + * EXTENT_BUFFER_ZONED_ZEROOUT. Please refer + * btrfs_clear_buffer_dirty() and btree_csum_one_bio() for + * detail. + */ + WARN_ON_ONCE(btrfs_header_bytenr(buf) == 0); + btrfs_init_generic_ref(&generic_ref, BTRFS_DROP_DELAYED_REF, buf->start, buf->len, parent, btrfs_header_owner(buf)); > > > > > Addressing that root cause needs more time. Until that, leave the extent > > > buffer header intact, so that we can add a proper delayed reference entry. > > > > > > Fixes: aa6313e6ff2b ("btrfs: zoned: don't clear dirty flag of extent buffer") > > > Link: https://lore.kernel.org/linux-btrfs/oadvdekkturysgfgi4qzuemd57zudeasynswurjxw3ocdfsef6@sjyufeugh63f/ > > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> > > > --- > > > fs/btrfs/disk-io.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > > > index a2e45ed6ef14..8aaed8719394 100644 > > > --- a/fs/btrfs/disk-io.c > > > +++ b/fs/btrfs/disk-io.c > > > @@ -278,7 +278,9 @@ blk_status_t btree_csum_one_bio(struct btrfs_bio *bbio) > > > * ordering of I/O without unnecessarily writing out data. > > > */ > > > if (test_bit(EXTENT_BUFFER_ZONED_ZEROOUT, &eb->bflags)) { > > > - memzero_extent_buffer(eb, 0, eb->len); > > > + const unsigned long header_size = sizeof(struct btrfs_header); > > > + > > > + memzero_extent_buffer(eb, header_size, eb->len - header_size); > > > > So this means anything that finds the buffer will have to rely on the > > state in the header and if it's with ZONED_ZEROOUT then stop processing > > it. btree_csum_one_bio() is the only function that checks the bit > > AFAICS. > > Yes, indeed it's cumbersome to check the flag for every btrfs_header_* > usage. So, I found another solution and I'm testing it. It looks like > adding EXTENT_BUFFER_ZONED_ZEROOUT even when an extent buffer is not DIRTY > is causing the bug. And, tested this solution like diff below. It survives generic/013 100 times without hitting the WARN above. However, generic/561 hit the WARN as below. So, there is some other reason (than clearing non-dirty extent buffer) causing the issue. diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index c7ecfcac34fc..70f23f51fda7 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4152,7 +4152,9 @@ void btrfs_clear_buffer_dirty(struct btrfs_trans_handle *trans, * The actual zeroout of the buffer will happen later in * btree_csum_one_bio. */ if (btrfs_is_zoned(fs_info)) { + if (!test_bit(EXTENT_BUFFER_DIRTY, &eb->bflags)) + return; set_bit(EXTENT_BUFFER_ZONED_ZEROOUT, &eb->bflags); return; } [ 9014.582049] ------------[ cut here ]------------ [ 9014.598804] WARNING: CPU: 10 PID: 1716009 at fs/btrfs/extent-tree.c:3473 btrfs_free_tree_block+0x26b/0x2e0 [btrfs] [ 9014.621665] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison dm_snapshot dm_bufio dm_flakey algif_hash af_alg xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables br_netfilter bridge stp llc overlay sunrpc kvm_amd kvm irqbypass ipmi_ssif acpi_cpufreq rapl ipmi_si i2c_piix4 btrfs k10temp ipmi_devintf blake2b_generic ipmi_msghandler xor raid6_pq bfq loop dm_mod zram bnxt_en ccp pkcs8_key_parser asn1_decoder public_key oid_registry fuse msr ipv6 [ 9014.730793] CPU: 10 PID: 1716009 Comm: fsstress Not tainted 6.8.0-rc7-BTRFS-ZNS+ #408 [ 9014.751296] Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021 [ 9014.771359] RIP: 0010:btrfs_free_tree_block+0x26b/0x2e0 [btrfs] [ 9014.789863] Code: 0b e9 07 ff ff ff 8b 4b 08 48 89 f2 48 89 ef 41 b8 01 00 00 00 4c 89 e6 e8 d2 80 ff ff 4c 89 e7 e8 7a bb 0c 00 e9 e3 fe ff ff <0f> 0b e9 1c fe ff ff 0f 0b 65 8b 05 71 63 b8 5f 89 c0 48 0f a3 05 [ 9014.833425] RSP: 0018:ffffc90023ea7a70 EFLAGS: 00010246 [ 9014.851277] RAX: ffff888c4a4fb000 RBX: ffff88925d774c08 RCX: 0000000000000000 [ 9014.871007] RDX: 0000000000000000 RSI: 00000003324b0000 RDI: ffffc90023ea7ab8 [ 9014.890724] RBP: ffff889cf09cad98 R08: 0000000000000001 R09: 0000000000000000 [ 9014.910381] R10: 0000000000000008 R11: 0000000000000000 R12: 0000000000000002 [ 9014.929861] R13: ffffc90023ea7a70 R14: ffff889d6443c000 R15: 0000000000000001 [ 9014.949121] FS: 00007f3cb82e1740(0000) GS:ffff88a04e880000(0000) knlGS:0000000000000000 [ 9014.969242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9014.986837] CR2: 00007f8d3da345f0 CR3: 0000000111b0c000 CR4: 0000000000350ef0 [ 9015.005852] Call Trace: [ 9015.020105] <TASK> [ 9015.033830] ? btrfs_free_tree_block+0x26b/0x2e0 [btrfs] [ 9015.050743] ? __warn+0x81/0x170 [ 9015.065306] ? btrfs_free_tree_block+0x26b/0x2e0 [btrfs] [ 9015.082059] ? report_bug+0x18d/0x1c0 [ 9015.096896] ? handle_bug+0x3a/0x70 [ 9015.111278] ? exc_invalid_op+0x13/0x60 [ 9015.125813] ? asm_exc_invalid_op+0x16/0x20 [ 9015.140616] ? btrfs_free_tree_block+0x26b/0x2e0 [btrfs] [ 9015.156645] btrfs_del_leaf+0xc4/0xe0 [btrfs] [ 9015.171612] btrfs_del_items+0x467/0x520 [btrfs] [ 9015.186613] __btrfs_free_extent.isra.0+0x927/0x1370 [btrfs] [ 9015.202561] __btrfs_run_delayed_refs+0x40c/0x1280 [btrfs] [ 9015.218280] ? srso_untrain_ret+0x2/0x2 [ 9015.232261] ? srso_return_thunk+0x5/0x5f [ 9015.246289] ? srso_return_thunk+0x5/0x5f [ 9015.260043] ? rcu_is_watching+0xd/0x40 [ 9015.273399] ? srso_return_thunk+0x5/0x5f [ 9015.286705] ? lock_release+0x1e5/0x280 [ 9015.299739] ? btrfs_start_dirty_block_groups+0x105/0x5a0 [btrfs] [ 9015.315198] btrfs_run_delayed_refs+0x33/0x120 [btrfs] [ 9015.329620] btrfs_start_dirty_block_groups+0x3aa/0x5a0 [btrfs] [ 9015.344660] ? srso_return_thunk+0x5/0x5f [ 9015.357546] ? btrfs_commit_transaction+0x3e/0x12c0 [btrfs] [ 9015.372148] btrfs_commit_transaction+0xec/0x12c0 [btrfs] [ 9015.386520] ? srso_return_thunk+0x5/0x5f [ 9015.399364] ? btrfs_attach_transaction_barrier+0x1e/0x60 [btrfs] [ 9015.414306] ? __pfx_sync_fs_one_sb+0x10/0x10 [ 9015.427232] iterate_supers+0x7a/0xe0 [ 9015.439191] ksys_sync+0x60/0xa0 [ 9015.450453] __do_sys_sync+0xa/0x20 [ 9015.461714] do_syscall_64+0x77/0x1a0 [ 9015.472886] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 9015.485376] RIP: 0033:0x7f3cb83e2487 > > > Otherwise, if this is a sufficient fix we'd need it for 6.8.
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a2e45ed6ef14..8aaed8719394 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -278,7 +278,9 @@ blk_status_t btree_csum_one_bio(struct btrfs_bio *bbio) * ordering of I/O without unnecessarily writing out data. */ if (test_bit(EXTENT_BUFFER_ZONED_ZEROOUT, &eb->bflags)) { - memzero_extent_buffer(eb, 0, eb->len); + const unsigned long header_size = sizeof(struct btrfs_header); + + memzero_extent_buffer(eb, header_size, eb->len - header_size); return BLK_STS_OK; }
Btrfs clears the content of an extent buffer marked as EXTENT_BUFFER_ZONED_ZEROOUT before the bio submission. This mechanism is introduced to prevent a write hole of an extent buffer, which is once allocated, marked dirty, but turns out unnecessary and cleaned up within one transaction operation. However, btrfs_free_tree_block() can be called on an extent buffer after its content is cleared. Then, it inserts a faulty delayed reference entry, which makes the FS corrupted. This bug can be triggered running generic/013 several (~200) times. It failed as following: ------------[ cut here ]------------ WARNING: CPU: 9 PID: 29834 at fs/btrfs/extent-tree.c:3248 __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs] Modules linked in: dm_flakey algif_hash af_alg xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables br_netfilter bridge stp llc overlay sunrpc kvm_amd kvm irqbypass rapl acpi_cpufreq ipmi_ssif i2c_piix4 k10temp btrfs ipmi_si blake2b_generic ipmi_devintf xor ipmi_msghandler raid6_pq bfq loop dm_mod zram bnxt_en ccp pkcs8_key_parser asn1_decoder public_key oid_registry fuse msr ipv6 CPU: 9 PID: 29834 Comm: fsstress Not tainted 6.8.0-rc4-BTRFS-ZNS+ #403 Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021 RIP: 0010:__btrfs_free_extent.isra.0+0x604/0x1330 [btrfs] Code: 8b 3f e8 bf 69 00 00 48 8b 7d 60 45 8b 4f 40 49 89 d8 8b 54 24 40 4c 89 e9 48 c7 c6 30 64 65 a0 e8 61 fb 0d 00 e9 8f fd ff ff <0f> 0b f0 48 0f ba 28 02 41 b8 00 00 00 00 0f 83 86 04 00 00 b9 8b RSP: 0018:ffffc900090cfb80 EFLAGS: 00010246 RAX: ffff888365c719d8 RBX: 0000000f9677c000 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000 RBP: ffff8889a044b220 R08: 0000000000000000 R09: 0000000000000004 R10: 0000000000000000 R11: 00000000ffffffff R12: 0000000000000001 R13: ffff888ad87a4c98 R14: 0000000000000005 R15: ffff888a0c7d2a80 FS: 00007f823f5f7740(0000) GS:ffff889fcea40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000560ce0610b38 CR3: 0000000a907ec000 CR4: 0000000000350ef0 Call Trace: <TASK> ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs] ? __warn+0x81/0x170 ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs] ? report_bug+0x18d/0x1c0 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x13/0x60 ? asm_exc_invalid_op+0x16/0x20 ? __btrfs_free_extent.isra.0+0x604/0x1330 [btrfs] ? srso_return_thunk+0x5/0x5f ? rcu_is_watching+0xd/0x40 ? srso_return_thunk+0x5/0x5f ? lock_release+0x1e5/0x280 __btrfs_run_delayed_refs+0x64c/0x1380 [btrfs] ? btrfs_commit_transaction+0x3e/0x12d0 [btrfs] btrfs_run_delayed_refs+0x92/0x130 [btrfs] btrfs_commit_transaction+0xa2/0x12d0 [btrfs] ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f ? rcu_is_watching+0xd/0x40 ? srso_return_thunk+0x5/0x5f ? lock_release+0x1e5/0x280 btrfs_sync_file+0x532/0x660 [btrfs] __x64_sys_fsync+0x37/0x60 do_syscall_64+0x79/0x1a0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 RIP: 0033:0x7f823f6f8400 Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d e1 d1 0d 00 00 74 17 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c RSP: 002b:00007ffe3c26e9f8 EFLAGS: 00000202 ORIG_RAX: 000000000000004a RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f823f6f8400 RDX: 0000000000000193 RSI: 0000560cdfcdb048 RDI: 0000000000000003 RBP: 00000000000002e6 R08: 0000000000000007 R09: 00007ffe3c26ea0c R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffe3c26ea10 R13: 028f5c28f5c28f5c R14: 8f5c28f5c28f5c29 R15: 0000560cdfcd7180 </TASK> irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00 softirqs last enabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace 0000000000000000 ]--- ------------[ cut here ]------------ BTRFS: Transaction aborted (error -117) WARNING: CPU: 9 PID: 29834 at fs/btrfs/extent-tree.c:3249 __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs] Modules linked in: dm_flakey algif_hash af_alg xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables br_netfilter bridge stp llc overlay sunrpc kvm_amd kvm irqbypass rapl acpi_cpufreq ipmi_ssif i2c_piix4 k10temp btrfs ipmi_si blake2b_generic ipmi_devintf xor ipmi_msghandler raid6_pq bfq loop dm_mod zram bnxt_en ccp pkcs8_key_parser asn1_decoder public_key oid_registry fuse msr ipv6 CPU: 9 PID: 29834 Comm: fsstress Tainted: G W 6.8.0-rc4-BTRFS-ZNS+ #403 Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 2.0 02/22/2021 RIP: 0010:__btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs] Code: 48 c7 c6 40 5d 65 a0 e8 f0 f1 0d 00 c7 44 24 18 01 00 00 00 e9 ed f7 ff ff be 8b ff ff ff 48 c7 c7 68 5d 65 a0 e8 52 69 c1 e0 <0f> 0b e9 30 fb ff ff 48 8b 45 60 48 05 d8 19 00 00 f0 48 0f ba 28 RSP: 0018:ffffc900090cfb80 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000f9677c000 RCX: 0000000000000000 RDX: 0000000000000002 RSI: ffffffff82464302 RDI: 00000000ffffffff RBP: ffff8889a044b220 R08: 0000000000009ffb R09: 00000000ffffdfff R10: 00000000ffffdfff R11: ffffffff8264dd80 R12: 0000000000000001 R13: ffff888ad87a4c98 R14: 0000000000000005 R15: ffff888a0c7d2a80 FS: 00007f823f5f7740(0000) GS:ffff889fcea40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000560ce0610b38 CR3: 0000000a907ec000 CR4: 0000000000350ef0 Call Trace: <TASK> ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs] ? __warn+0x81/0x170 ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs] ? report_bug+0x18d/0x1c0 ? tick_nohz_tick_stopped+0x12/0x30 ? srso_return_thunk+0x5/0x5f ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x13/0x60 ? asm_exc_invalid_op+0x16/0x20 ? __btrfs_free_extent.isra.0+0xf8e/0x1330 [btrfs] ? srso_return_thunk+0x5/0x5f ? rcu_is_watching+0xd/0x40 ? srso_return_thunk+0x5/0x5f ? lock_release+0x1e5/0x280 __btrfs_run_delayed_refs+0x64c/0x1380 [btrfs] ? btrfs_commit_transaction+0x3e/0x12d0 [btrfs] btrfs_run_delayed_refs+0x92/0x130 [btrfs] btrfs_commit_transaction+0xa2/0x12d0 [btrfs] ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f ? rcu_is_watching+0xd/0x40 ? srso_return_thunk+0x5/0x5f ? lock_release+0x1e5/0x280 btrfs_sync_file+0x532/0x660 [btrfs] __x64_sys_fsync+0x37/0x60 do_syscall_64+0x79/0x1a0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 RIP: 0033:0x7f823f6f8400 Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d e1 d1 0d 00 00 74 17 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c RSP: 002b:00007ffe3c26e9f8 EFLAGS: 00000202 ORIG_RAX: 000000000000004a RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f823f6f8400 RDX: 0000000000000193 RSI: 0000560cdfcdb048 RDI: 0000000000000003 RBP: 00000000000002e6 R08: 0000000000000007 R09: 00007ffe3c26ea0c R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffe3c26ea10 R13: 028f5c28f5c28f5c R14: 8f5c28f5c28f5c29 R15: 0000560cdfcd7180 </TASK> irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00 softirqs last enabled at (0): [<ffffffff810e5e0e>] copy_process+0xb0e/0x1e00 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace 0000000000000000 ]--- BTRFS: error (device nvme1n2: state A) in __btrfs_free_extent:3249: errno=-117 Filesystem corrupted BTRFS info (device nvme1n2: state EA): forced readonly BTRFS info (device nvme1n2: state EA): leaf 66957836288 gen 3873 total ptrs 203 free space 1102 owner 2 BTRFS info (device nvme1n2: state EA): refs 2 lock_owner 29834 current 29834 item 0 key (63394947072 168 40960) itemoff 16230 itemsize 53 extent refs 1 gen 3835 flags 1 ref#0: extent data backref root 5 objectid 552 offset 1802240 count 1 (snip)... item 164 key (66948923392 169 0) itemoff 8229 itemsize 33 extent refs 1 gen 3872 flags 2 ref#0: tree block backref root 2 item 165 key (66948939776 169 1) itemoff 8196 itemsize 33 extent refs 1 gen 3873 flags 2 ref#0: tree block backref root 5 item 166 key (68719476736 168 110592) itemoff 8143 itemsize 53 extent refs 1 gen 3841 flags 1 ref#0: extent data backref root 5 objectid 440 offset 3100672 count 1 (snip)... item 202 key (68722249728 168 110592) itemoff 6177 itemsize 53 extent refs 1 gen 3842 flags 1 ref#0: extent data backref root 5 objectid 953 offset 5431296 count 1 BTRFS critical (device nvme1n2: state EA): unable to find ref byte nr 66948939776 parent 66948939776 root 5 owner 0 offset 0 slot 166 BTRFS error (device nvme1n2: state EA): failed to run delayed ref for logical 66948939776 num_bytes 16384 type 182 action 2 ref_mod 1: -2 BTRFS: error (device nvme1n2: state EA) in btrfs_run_delayed_refs:2246: errno=-2 No such entry BTRFS warning (device nvme1n2: state EA): Skipping commit of aborted transaction. BTRFS: error (device nvme1n2: state EA) in cleanup_transaction:2006: errno=-2 No such entry This happens maybe because clearing the contents is too early. It should clear the content after all the reference to the node is dropped. Addressing that root cause needs more time. Until that, leave the extent buffer header intact, so that we can add a proper delayed reference entry. Fixes: aa6313e6ff2b ("btrfs: zoned: don't clear dirty flag of extent buffer") Link: https://lore.kernel.org/linux-btrfs/oadvdekkturysgfgi4qzuemd57zudeasynswurjxw3ocdfsef6@sjyufeugh63f/ Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> --- fs/btrfs/disk-io.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)