Message ID | 20241203075651.109899-1-zhenghaoran154@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v3] btrfs: fix data race when accessing the inode's disk_i_size at btrfs_drop_extents() | expand |
On Tue, Dec 3, 2024 at 7:59 AM Hao-ran Zheng <zhenghaoran154@gmail.com> wrote: > > A data race occurs when the function `insert_ordered_extent_file_extent()` > and the function `btrfs_inode_safe_disk_i_size_write()` are executed > concurrently. The function `insert_ordered_extent_file_extent()` is not > locked when reading inode->disk_i_size, causing > `btrfs_inode_safe_disk_i_size_write()` to cause data competition when > writing inode->disk_i_size, thus affecting the value of `modify_tree`. > > The specific call stack that appears during testing is as follows: > ============DATA_RACE============ > btrfs_drop_extents+0x89a/0xa060 [btrfs] > insert_reserved_file_extent+0xb54/0x2960 [btrfs] > insert_ordered_extent_file_extent+0xff5/0x1760 [btrfs] > btrfs_finish_one_ordered+0x1b85/0x36a0 [btrfs] > btrfs_finish_ordered_io+0x37/0x60 [btrfs] > finish_ordered_fn+0x3e/0x50 [btrfs] > btrfs_work_helper+0x9c9/0x27a0 [btrfs] > process_scheduled_works+0x716/0xf10 > worker_thread+0xb6a/0x1190 > kthread+0x292/0x330 > ret_from_fork+0x4d/0x80 > ret_from_fork_asm+0x1a/0x30 > ============OTHER_INFO============ > btrfs_inode_safe_disk_i_size_write+0x4ec/0x600 [btrfs] > btrfs_finish_one_ordered+0x24c7/0x36a0 [btrfs] > btrfs_finish_ordered_io+0x37/0x60 [btrfs] > finish_ordered_fn+0x3e/0x50 [btrfs] > btrfs_work_helper+0x9c9/0x27a0 [btrfs] > process_scheduled_works+0x716/0xf10 > worker_thread+0xb6a/0x1190 > kthread+0x292/0x330 > ret_from_fork+0x4d/0x80 > ret_from_fork_asm+0x1a/0x30 > ================================= > > The main purpose of the modify_tree variable is to optimize the > acquisition of read-write locks at or after the end of file (EOF). > When the value of modify_tree is modified due to concurrency, resulting > in unnecessary acquisition of write locks, the correctness of the > system will not be affected. When the system requires a write lock but > does not acquire the write lock because the value of modify_tree is > incorrect, the path will be subsequently released and the lock will > be reacquired to ensure the correctness of the system. Looks good, I'll slightly rephrase this on things like "modify_tree is modified due to concurrency" because what gets modified concurrently is inode->disk_i_size. I'll push it to the github btrfs for-next branch soon with this paragraph instead: "The main purpose of the check of the inode's disk_i_size is to avoid taking write locks on a btree path when we have a write at or beyond eof, since in these cases we don't expect to find extent items in the root to drop. However if we end up taking write locks due to a data race on disk_i_size, everything is still correct, we only add extra lock contention on the tree in case there's concurrency from other tasks. If the race causes us to not take write locks when we actually need them, then everything is functionally correct as well, since if we find out we have extent items to drop and we took read locks (modify_tree set to 0), we release the path and retry again with write locks." Thanks. Reviewed-by: Filipe Manana <fdmanana@suse.com> > > Since this data race does not affect the correctness of the function, > it is a harmless data race, and it is recommended to use `data_race` > to mark it. > > Signed-off-by: Hao-ran Zheng <zhenghaoran154@gmail.com> > --- > V2->V3: Added details on why this is a harmless data race > V1->V2: Modified patch description and format > --- > fs/btrfs/file.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index 4fb521d91b06..559c177456e6 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -242,7 +242,7 @@ int btrfs_drop_extents(struct btrfs_trans_handle *trans, > if (args->drop_cache) > btrfs_drop_extent_map_range(inode, args->start, args->end - 1, false); > > - if (args->start >= inode->disk_i_size && !args->replace_extent) > + if (data_race(args->start >= inode->disk_i_size) && !args->replace_extent) > modify_tree = 0; > > update_refs = (btrfs_root_id(root) != BTRFS_TREE_LOG_OBJECTID); > -- > 2.34.1 > >
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 4fb521d91b06..559c177456e6 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -242,7 +242,7 @@ int btrfs_drop_extents(struct btrfs_trans_handle *trans, if (args->drop_cache) btrfs_drop_extent_map_range(inode, args->start, args->end - 1, false); - if (args->start >= inode->disk_i_size && !args->replace_extent) + if (data_race(args->start >= inode->disk_i_size) && !args->replace_extent) modify_tree = 0; update_refs = (btrfs_root_id(root) != BTRFS_TREE_LOG_OBJECTID);
A data race occurs when the function `insert_ordered_extent_file_extent()` and the function `btrfs_inode_safe_disk_i_size_write()` are executed concurrently. The function `insert_ordered_extent_file_extent()` is not locked when reading inode->disk_i_size, causing `btrfs_inode_safe_disk_i_size_write()` to cause data competition when writing inode->disk_i_size, thus affecting the value of `modify_tree`. The specific call stack that appears during testing is as follows: ============DATA_RACE============ btrfs_drop_extents+0x89a/0xa060 [btrfs] insert_reserved_file_extent+0xb54/0x2960 [btrfs] insert_ordered_extent_file_extent+0xff5/0x1760 [btrfs] btrfs_finish_one_ordered+0x1b85/0x36a0 [btrfs] btrfs_finish_ordered_io+0x37/0x60 [btrfs] finish_ordered_fn+0x3e/0x50 [btrfs] btrfs_work_helper+0x9c9/0x27a0 [btrfs] process_scheduled_works+0x716/0xf10 worker_thread+0xb6a/0x1190 kthread+0x292/0x330 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 ============OTHER_INFO============ btrfs_inode_safe_disk_i_size_write+0x4ec/0x600 [btrfs] btrfs_finish_one_ordered+0x24c7/0x36a0 [btrfs] btrfs_finish_ordered_io+0x37/0x60 [btrfs] finish_ordered_fn+0x3e/0x50 [btrfs] btrfs_work_helper+0x9c9/0x27a0 [btrfs] process_scheduled_works+0x716/0xf10 worker_thread+0xb6a/0x1190 kthread+0x292/0x330 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 ================================= The main purpose of the modify_tree variable is to optimize the acquisition of read-write locks at or after the end of file (EOF). When the value of modify_tree is modified due to concurrency, resulting in unnecessary acquisition of write locks, the correctness of the system will not be affected. When the system requires a write lock but does not acquire the write lock because the value of modify_tree is incorrect, the path will be subsequently released and the lock will be reacquired to ensure the correctness of the system. Since this data race does not affect the correctness of the function, it is a harmless data race, and it is recommended to use `data_race` to mark it. Signed-off-by: Hao-ran Zheng <zhenghaoran154@gmail.com> --- V2->V3: Added details on why this is a harmless data race V1->V2: Modified patch description and format --- fs/btrfs/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)