diff mbox series

[1/2] btrfs: fix btrfs_read_folio race in relocation

Message ID 13b9e50db9f1778e2819f597dbd82498860f1666.1734131353.git.boris@bur.io (mailing list archive)
State New
Headers show
Series fixes for btrfs_read_folio races | expand

Commit Message

Boris Burkov Dec. 13, 2024, 11:13 p.m. UTC
When we call btrfs_read_folio to bring a folio uptodate, we unlock the
folio. The result of that is that a different thread can modify the
mapping (like remove it with invalidate) before we call folio_lock.
This results in an invalid page and we need to try again.

In particular, if we are relocating concurrently with aborting a
transaction, this can result in a crash like the following:

BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 0 P4D 0
Oops: 0000 [#1] SMP
CPU: 76 PID: 1411631 Comm: kworker/u322:5
Workqueue: events_unbound btrfs_reclaim_bgs_work
RIP: 0010:set_page_extent_mapped+0x20/0xb0
Code: c9 eb e8 cc cc cc cc cc cc cc 0f 1f 44 00 00 53 48 8b 47 08 a8 01 75 60 48 89 fb 66 90 48 f7 03 00 80 00 00 75 34 48 8b 4b 18 <48> 8b 01 48 8b 90 40 fe ff ff 48 8b ba 08 02 00 00 81 bf 8c 0c 00
RSP: 0018:ffffc900516a7be8 EFLAGS: 00010246
RAX: ffffea009e851d08 RBX: ffffea009e0b1880 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffc900516a7b90 RDI: ffffea009e0b1880
RBP: 0000000003573000 R08: 0000000000000001 R09: ffff88c07fd2f3f0
R10: 0000000000000000 R11: 0000194754b575be R12: 0000000003572000
R13: 0000000003572fff R14: 0000000000100cca R15: 0000000005582fff
FS:  0000000000000000(0000) GS:ffff88c07fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000407d00f002 CR4: 00000000007706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
? __die+0x78/0xc0
? page_fault_oops+0x2a8/0x3a0
? __switch_to+0x133/0x530
? wq_worker_running+0xa/0x40
? exc_page_fault+0x63/0x130
? asm_exc_page_fault+0x22/0x30
? set_page_extent_mapped+0x20/0xb0
relocate_file_extent_cluster+0x1a7/0x940
relocate_data_extent+0xaf/0x120
relocate_block_group+0x20f/0x480
btrfs_relocate_block_group+0x152/0x320
btrfs_relocate_chunk+0x3d/0x120
btrfs_reclaim_bgs_work+0x2ae/0x4e0
process_scheduled_works+0x184/0x370
worker_thread+0xc6/0x3e0
? blk_add_timer+0xb0/0xb0
kthread+0xae/0xe0
? flush_tlb_kernel_range+0x90/0x90
ret_from_fork+0x2f/0x40
? flush_tlb_kernel_range+0x90/0x90
ret_from_fork_asm+0x11/0x20
</TASK>

This occurs because cleanup_one_transaction calls
destroy_delalloc_inodes which calls invalidate_inode_pages2 which takes
the folio_lock before setting mapping to NULL. We fail to check this,
and subsequently call set_extent_mapping, which assumes that mapping !=
NULL (in fact it asserts that in debug mode)

Note that the "fixes" patch here is not the one that introduced the
race (the very first iteration of this code from 2009) but a more recent
change that made this particular crash happen in practice.

Fixes: e7f1326cc24e ("btrfs: set page extent mapped after read_folio in relocate_one_page")
Signed-off-by: Boris Burkov <boris@bur.io>
---
 fs/btrfs/relocation.c | 6 ++++++
 1 file changed, 6 insertions(+)
diff mbox series

Patch

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 6d3d2a9b6ed3..cdd9a7b15a11 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2831,6 +2831,7 @@  static int relocate_one_folio(struct reloc_control *rc,
 	const bool use_rst = btrfs_need_stripe_tree_update(fs_info, rc->block_group->flags);
 
 	ASSERT(index <= last_index);
+again:
 	folio = filemap_lock_folio(inode->i_mapping, index);
 	if (IS_ERR(folio)) {
 
@@ -2866,6 +2867,11 @@  static int relocate_one_folio(struct reloc_control *rc,
 			ret = -EIO;
 			goto release_folio;
 		}
+		if (folio->mapping != inode->i_mapping) {
+			folio_unlock(folio);
+			folio_put(folio);
+			goto again;
+		}
 	}
 
 	/*