Message ID | 20250208073829.1176287-1-zhenghaoran154@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | btrfs: fix data race when accessing the block_group's used field | expand |
On Sat, Feb 8, 2025 at 7:38 AM Hao-ran Zheng <zhenghaoran154@gmail.com> wrote: > > A data race may occur when the function `btrfs_discard_queue_work()` > and the function `btrfs_update_block_group()` is executed concurrently. > Specifically, when the `btrfs_update_block_group()` function is executed > to lines `cache->used = old_val;`, and `btrfs_discard_queue_work()` > is accessing `if(block_group->used == 0)` will appear with data race, > which may cause block_group to be placed unexpectedly in discard_list or > discard_unused_list. The specific function call stack is as follows: > > ============DATA_RACE============ > btrfs_discard_queue_work+0x245/0x500 [btrfs] > __btrfs_add_free_space+0x3066/0x32f0 [btrfs] > btrfs_add_free_space+0x19a/0x200 [btrfs] > unpin_extent_range+0x847/0x2120 [btrfs] > btrfs_finish_extent_commit+0x9a3/0x1840 [btrfs] > btrfs_commit_transaction+0x5f65/0xc0f0 [btrfs] > transaction_kthread+0x764/0xc20 [btrfs] > kthread+0x292/0x330 > ret_from_fork+0x4d/0x80 > ret_from_fork_asm+0x1a/0x30 > ============OTHER_INFO============ > btrfs_update_block_group+0xa9d/0x2430 [btrfs] > __btrfs_free_extent+0x4f69/0x9920 [btrfs] > __btrfs_run_delayed_refs+0x290e/0xd7d0 [btrfs] > btrfs_run_delayed_refs+0x317/0x770 [btrfs] > flush_space+0x388/0x1440 [btrfs] > btrfs_preempt_reclaim_metadata_space+0xd65/0x14c0 [btrfs] > process_scheduled_works+0x716/0xf10 > worker_thread+0xb6a/0x1190 > kthread+0x292/0x330 > ret_from_fork+0x4d/0x80 > ret_from_fork_asm+0x1a/0x30 > ================================= > > Although the `block_group->used` was checked again in the use of the > `peek_discard_list` function, considering that `block_group->used` is > a 64-bit variable, we still think that the data race here is an > unexpected behavior. It is recommended to add `READ_ONCE` and > `WRITE_ONCE` to read and write. > > Signed-off-by: Hao-ran Zheng <zhenghaoran154@gmail.com> > --- > fs/btrfs/block-group.c | 4 ++-- > fs/btrfs/discard.c | 2 +- > 2 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c > index c0a8f7d92acc..c681b97f6835 100644 > --- a/fs/btrfs/block-group.c > +++ b/fs/btrfs/block-group.c > @@ -3678,7 +3678,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, > old_val = cache->used; > if (alloc) { > old_val += num_bytes; > - cache->used = old_val; > + WRITE_ONCE(cache->used, old_val); > cache->reserved -= num_bytes; > cache->reclaim_mark = 0; > space_info->bytes_reserved -= num_bytes; > @@ -3690,7 +3690,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, > spin_unlock(&space_info->lock); > } else { > old_val -= num_bytes; > - cache->used = old_val; > + WRITE_ONCE(cache->used, old_val); > cache->pinned += num_bytes; > btrfs_space_info_update_bytes_pinned(space_info, num_bytes); > space_info->bytes_used -= num_bytes; > diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c > index e815d165cccc..71c57b571d50 100644 > --- a/fs/btrfs/discard.c > +++ b/fs/btrfs/discard.c > @@ -363,7 +363,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl, > if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC)) > return; > > - if (block_group->used == 0) > + if (READ_ONCE(block_group->used) == 0) There are at least 3 more places in discard.c where we access ->used without being under the protection of the block group's spinlock. So let's fix this for all places and not just a single one... Also, this is quite ugly to spread READ_ONCE/WRITE_ONCE all over the place. What we typically do in btrfs is to add helpers that hide them, see block-rsv.h for example. Also, I don't think we need READ_ONCE/WRITE_ONCE. We could use data_race(), though I think that could be subject to load/store tearing, or just take the lock. So adding a helper like this to block-group.h: static inline u64 btrfs_block_group_used(struct btrfs_block_group *bg) { u64 ret; spin_lock(&bg->lock); ret = bg->used; spin_unlock(&bg->lock); return ret; } And then use btrfs_bock_group_used() everywhere in discard.c where we aren't holding a block group's lock. Thanks. > add_to_discard_unused_list(discard_ctl, block_group); > else > add_to_discard_list(discard_ctl, block_group); > -- > 2.34.1 > >
On Mon, 10 Feb 2025 at 12:11, Filipe Manana <fdmanana@kernel.org> wrote: > > On Sat, Feb 8, 2025 at 7:38 AM Hao-ran Zheng <zhenghaoran154@gmail.com> wrote: > > > > A data race may occur when the function `btrfs_discard_queue_work()` > > and the function `btrfs_update_block_group()` is executed concurrently. > > Specifically, when the `btrfs_update_block_group()` function is executed > > to lines `cache->used = old_val;`, and `btrfs_discard_queue_work()` > > is accessing `if(block_group->used == 0)` will appear with data race, > > which may cause block_group to be placed unexpectedly in discard_list or > > discard_unused_list. The specific function call stack is as follows: > > > > ============DATA_RACE============ > > btrfs_discard_queue_work+0x245/0x500 [btrfs] > > __btrfs_add_free_space+0x3066/0x32f0 [btrfs] > > btrfs_add_free_space+0x19a/0x200 [btrfs] > > unpin_extent_range+0x847/0x2120 [btrfs] > > btrfs_finish_extent_commit+0x9a3/0x1840 [btrfs] > > btrfs_commit_transaction+0x5f65/0xc0f0 [btrfs] > > transaction_kthread+0x764/0xc20 [btrfs] > > kthread+0x292/0x330 > > ret_from_fork+0x4d/0x80 > > ret_from_fork_asm+0x1a/0x30 > > ============OTHER_INFO============ > > btrfs_update_block_group+0xa9d/0x2430 [btrfs] > > __btrfs_free_extent+0x4f69/0x9920 [btrfs] > > __btrfs_run_delayed_refs+0x290e/0xd7d0 [btrfs] > > btrfs_run_delayed_refs+0x317/0x770 [btrfs] > > flush_space+0x388/0x1440 [btrfs] > > btrfs_preempt_reclaim_metadata_space+0xd65/0x14c0 [btrfs] > > process_scheduled_works+0x716/0xf10 > > worker_thread+0xb6a/0x1190 > > kthread+0x292/0x330 > > ret_from_fork+0x4d/0x80 > > ret_from_fork_asm+0x1a/0x30 > > ================================= > > > > Although the `block_group->used` was checked again in the use of the > > `peek_discard_list` function, considering that `block_group->used` is > > a 64-bit variable, we still think that the data race here is an > > unexpected behavior. It is recommended to add `READ_ONCE` and > > `WRITE_ONCE` to read and write. > > > > Signed-off-by: Hao-ran Zheng <zhenghaoran154@gmail.com> > > --- > > fs/btrfs/block-group.c | 4 ++-- > > fs/btrfs/discard.c | 2 +- > > 2 files changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c > > index c0a8f7d92acc..c681b97f6835 100644 > > --- a/fs/btrfs/block-group.c > > +++ b/fs/btrfs/block-group.c > > @@ -3678,7 +3678,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, > > old_val = cache->used; > > if (alloc) { > > old_val += num_bytes; > > - cache->used = old_val; > > + WRITE_ONCE(cache->used, old_val); > > cache->reserved -= num_bytes; > > cache->reclaim_mark = 0; > > space_info->bytes_reserved -= num_bytes; > > @@ -3690,7 +3690,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, > > spin_unlock(&space_info->lock); > > } else { > > old_val -= num_bytes; > > - cache->used = old_val; > > + WRITE_ONCE(cache->used, old_val); > > cache->pinned += num_bytes; > > btrfs_space_info_update_bytes_pinned(space_info, num_bytes); > > space_info->bytes_used -= num_bytes; > > diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c > > index e815d165cccc..71c57b571d50 100644 > > --- a/fs/btrfs/discard.c > > +++ b/fs/btrfs/discard.c > > @@ -363,7 +363,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl, > > if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC)) > > return; > > > > - if (block_group->used == 0) > > + if (READ_ONCE(block_group->used) == 0) > > There are at least 3 more places in discard.c where we access ->used > without being under the protection of the block group's spinlock. > So let's fix this for all places and not just a single one... > > Also, this is quite ugly to spread READ_ONCE/WRITE_ONCE all over the place. > What we typically do in btrfs is to add helpers that hide them, see > block-rsv.h for example. > > Also, I don't think we need READ_ONCE/WRITE_ONCE. > We could use data_race(), though I think that could be subject to > load/store tearing, or just take the lock. > So adding a helper like this to block-group.h: > > static inline u64 btrfs_block_group_used(struct btrfs_block_group *bg) > { > u64 ret; > > spin_lock(&bg->lock); > ret = bg->used; > spin_unlock(&bg->lock); > > return ret; > } Would memory barriers be sufficient here? Taking a lock just for reading one member seems excessive... > And then use btrfs_bock_group_used() everywhere in discard.c where we > aren't holding a block group's lock. > > Thanks. > > > > add_to_discard_unused_list(discard_ctl, block_group); > > else > > add_to_discard_list(discard_ctl, block_group); > > -- > > 2.34.1 > > > > >
On Tue, Feb 18, 2025 at 8:08 AM Daniel Vacek <neelx@suse.com> wrote: > > On Mon, 10 Feb 2025 at 12:11, Filipe Manana <fdmanana@kernel.org> wrote: > > > > On Sat, Feb 8, 2025 at 7:38 AM Hao-ran Zheng <zhenghaoran154@gmail.com> wrote: > > > > > > A data race may occur when the function `btrfs_discard_queue_work()` > > > and the function `btrfs_update_block_group()` is executed concurrently. > > > Specifically, when the `btrfs_update_block_group()` function is executed > > > to lines `cache->used = old_val;`, and `btrfs_discard_queue_work()` > > > is accessing `if(block_group->used == 0)` will appear with data race, > > > which may cause block_group to be placed unexpectedly in discard_list or > > > discard_unused_list. The specific function call stack is as follows: > > > > > > ============DATA_RACE============ > > > btrfs_discard_queue_work+0x245/0x500 [btrfs] > > > __btrfs_add_free_space+0x3066/0x32f0 [btrfs] > > > btrfs_add_free_space+0x19a/0x200 [btrfs] > > > unpin_extent_range+0x847/0x2120 [btrfs] > > > btrfs_finish_extent_commit+0x9a3/0x1840 [btrfs] > > > btrfs_commit_transaction+0x5f65/0xc0f0 [btrfs] > > > transaction_kthread+0x764/0xc20 [btrfs] > > > kthread+0x292/0x330 > > > ret_from_fork+0x4d/0x80 > > > ret_from_fork_asm+0x1a/0x30 > > > ============OTHER_INFO============ > > > btrfs_update_block_group+0xa9d/0x2430 [btrfs] > > > __btrfs_free_extent+0x4f69/0x9920 [btrfs] > > > __btrfs_run_delayed_refs+0x290e/0xd7d0 [btrfs] > > > btrfs_run_delayed_refs+0x317/0x770 [btrfs] > > > flush_space+0x388/0x1440 [btrfs] > > > btrfs_preempt_reclaim_metadata_space+0xd65/0x14c0 [btrfs] > > > process_scheduled_works+0x716/0xf10 > > > worker_thread+0xb6a/0x1190 > > > kthread+0x292/0x330 > > > ret_from_fork+0x4d/0x80 > > > ret_from_fork_asm+0x1a/0x30 > > > ================================= > > > > > > Although the `block_group->used` was checked again in the use of the > > > `peek_discard_list` function, considering that `block_group->used` is > > > a 64-bit variable, we still think that the data race here is an > > > unexpected behavior. It is recommended to add `READ_ONCE` and > > > `WRITE_ONCE` to read and write. > > > > > > Signed-off-by: Hao-ran Zheng <zhenghaoran154@gmail.com> > > > --- > > > fs/btrfs/block-group.c | 4 ++-- > > > fs/btrfs/discard.c | 2 +- > > > 2 files changed, 3 insertions(+), 3 deletions(-) > > > > > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c > > > index c0a8f7d92acc..c681b97f6835 100644 > > > --- a/fs/btrfs/block-group.c > > > +++ b/fs/btrfs/block-group.c > > > @@ -3678,7 +3678,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, > > > old_val = cache->used; > > > if (alloc) { > > > old_val += num_bytes; > > > - cache->used = old_val; > > > + WRITE_ONCE(cache->used, old_val); > > > cache->reserved -= num_bytes; > > > cache->reclaim_mark = 0; > > > space_info->bytes_reserved -= num_bytes; > > > @@ -3690,7 +3690,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, > > > spin_unlock(&space_info->lock); > > > } else { > > > old_val -= num_bytes; > > > - cache->used = old_val; > > > + WRITE_ONCE(cache->used, old_val); > > > cache->pinned += num_bytes; > > > btrfs_space_info_update_bytes_pinned(space_info, num_bytes); > > > space_info->bytes_used -= num_bytes; > > > diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c > > > index e815d165cccc..71c57b571d50 100644 > > > --- a/fs/btrfs/discard.c > > > +++ b/fs/btrfs/discard.c > > > @@ -363,7 +363,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl, > > > if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC)) > > > return; > > > > > > - if (block_group->used == 0) > > > + if (READ_ONCE(block_group->used) == 0) > > > > There are at least 3 more places in discard.c where we access ->used > > without being under the protection of the block group's spinlock. > > So let's fix this for all places and not just a single one... > > > > Also, this is quite ugly to spread READ_ONCE/WRITE_ONCE all over the place. > > What we typically do in btrfs is to add helpers that hide them, see > > block-rsv.h for example. > > > > Also, I don't think we need READ_ONCE/WRITE_ONCE. > > We could use data_race(), though I think that could be subject to > > load/store tearing, or just take the lock. > > So adding a helper like this to block-group.h: > > > > static inline u64 btrfs_block_group_used(struct btrfs_block_group *bg) > > { > > u64 ret; > > > > spin_lock(&bg->lock); > > ret = bg->used; > > spin_unlock(&bg->lock); > > > > return ret; > > } > > Would memory barriers be sufficient here? Taking a lock just for > reading one member seems excessive... Do you think there's heavy contention on this lock? Usually I prefer to go the simplest way, and using locks is a lot more straightforward and easier to understand than memory barriers. Unless it's clear there's an observable performance penalty, keeping it simple is better. data_race() may be ok here, at least for one of the unprotected accesses it's fine, but would have to analyse the other 3 cases. > > > And then use btrfs_bock_group_used() everywhere in discard.c where we > > aren't holding a block group's lock. > > > > Thanks. > > > > > > > add_to_discard_unused_list(discard_ctl, block_group); > > > else > > > add_to_discard_list(discard_ctl, block_group); > > > -- > > > 2.34.1 > > > > > > > >
On Tue, 18 Feb 2025 at 11:19, Filipe Manana <fdmanana@kernel.org> wrote: > > On Tue, Feb 18, 2025 at 8:08 AM Daniel Vacek <neelx@suse.com> wrote: > > > > On Mon, 10 Feb 2025 at 12:11, Filipe Manana <fdmanana@kernel.org> wrote: > > > > > > On Sat, Feb 8, 2025 at 7:38 AM Hao-ran Zheng <zhenghaoran154@gmail.com> wrote: > > > > > > > > A data race may occur when the function `btrfs_discard_queue_work()` > > > > and the function `btrfs_update_block_group()` is executed concurrently. > > > > Specifically, when the `btrfs_update_block_group()` function is executed > > > > to lines `cache->used = old_val;`, and `btrfs_discard_queue_work()` > > > > is accessing `if(block_group->used == 0)` will appear with data race, > > > > which may cause block_group to be placed unexpectedly in discard_list or > > > > discard_unused_list. The specific function call stack is as follows: > > > > > > > > ============DATA_RACE============ > > > > btrfs_discard_queue_work+0x245/0x500 [btrfs] > > > > __btrfs_add_free_space+0x3066/0x32f0 [btrfs] > > > > btrfs_add_free_space+0x19a/0x200 [btrfs] > > > > unpin_extent_range+0x847/0x2120 [btrfs] > > > > btrfs_finish_extent_commit+0x9a3/0x1840 [btrfs] > > > > btrfs_commit_transaction+0x5f65/0xc0f0 [btrfs] > > > > transaction_kthread+0x764/0xc20 [btrfs] > > > > kthread+0x292/0x330 > > > > ret_from_fork+0x4d/0x80 > > > > ret_from_fork_asm+0x1a/0x30 > > > > ============OTHER_INFO============ > > > > btrfs_update_block_group+0xa9d/0x2430 [btrfs] > > > > __btrfs_free_extent+0x4f69/0x9920 [btrfs] > > > > __btrfs_run_delayed_refs+0x290e/0xd7d0 [btrfs] > > > > btrfs_run_delayed_refs+0x317/0x770 [btrfs] > > > > flush_space+0x388/0x1440 [btrfs] > > > > btrfs_preempt_reclaim_metadata_space+0xd65/0x14c0 [btrfs] > > > > process_scheduled_works+0x716/0xf10 > > > > worker_thread+0xb6a/0x1190 > > > > kthread+0x292/0x330 > > > > ret_from_fork+0x4d/0x80 > > > > ret_from_fork_asm+0x1a/0x30 > > > > ================================= > > > > > > > > Although the `block_group->used` was checked again in the use of the > > > > `peek_discard_list` function, considering that `block_group->used` is > > > > a 64-bit variable, we still think that the data race here is an > > > > unexpected behavior. It is recommended to add `READ_ONCE` and > > > > `WRITE_ONCE` to read and write. > > > > > > > > Signed-off-by: Hao-ran Zheng <zhenghaoran154@gmail.com> > > > > --- > > > > fs/btrfs/block-group.c | 4 ++-- > > > > fs/btrfs/discard.c | 2 +- > > > > 2 files changed, 3 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c > > > > index c0a8f7d92acc..c681b97f6835 100644 > > > > --- a/fs/btrfs/block-group.c > > > > +++ b/fs/btrfs/block-group.c > > > > @@ -3678,7 +3678,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, > > > > old_val = cache->used; > > > > if (alloc) { > > > > old_val += num_bytes; > > > > - cache->used = old_val; > > > > + WRITE_ONCE(cache->used, old_val); > > > > cache->reserved -= num_bytes; > > > > cache->reclaim_mark = 0; > > > > space_info->bytes_reserved -= num_bytes; > > > > @@ -3690,7 +3690,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, > > > > spin_unlock(&space_info->lock); > > > > } else { > > > > old_val -= num_bytes; > > > > - cache->used = old_val; > > > > + WRITE_ONCE(cache->used, old_val); > > > > cache->pinned += num_bytes; > > > > btrfs_space_info_update_bytes_pinned(space_info, num_bytes); > > > > space_info->bytes_used -= num_bytes; > > > > diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c > > > > index e815d165cccc..71c57b571d50 100644 > > > > --- a/fs/btrfs/discard.c > > > > +++ b/fs/btrfs/discard.c > > > > @@ -363,7 +363,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl, > > > > if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC)) > > > > return; > > > > > > > > - if (block_group->used == 0) > > > > + if (READ_ONCE(block_group->used) == 0) > > > > > > There are at least 3 more places in discard.c where we access ->used > > > without being under the protection of the block group's spinlock. > > > So let's fix this for all places and not just a single one... > > > > > > Also, this is quite ugly to spread READ_ONCE/WRITE_ONCE all over the place. > > > What we typically do in btrfs is to add helpers that hide them, see > > > block-rsv.h for example. > > > > > > Also, I don't think we need READ_ONCE/WRITE_ONCE. > > > We could use data_race(), though I think that could be subject to > > > load/store tearing, or just take the lock. > > > So adding a helper like this to block-group.h: > > > > > > static inline u64 btrfs_block_group_used(struct btrfs_block_group *bg) > > > { > > > u64 ret; > > > > > > spin_lock(&bg->lock); > > > ret = bg->used; > > > spin_unlock(&bg->lock); > > > > > > return ret; > > > } > > > > Would memory barriers be sufficient here? Taking a lock just for > > reading one member seems excessive... > > Do you think there's heavy contention on this lock? This is not about contention. Spin lock should just never be used this way. Or any lock actually. The critical section only contains a single fetch operation which does not justify using a lock. Hence the only guarantee such lock usage provides are the implicit memory barriers (from which maybe only one of them is really needed depending on the context where this helper is going to be used). Simply put, the lock is degraded into a memory barrier this way. So why not just use the barriers in the first place if only ordering guarantees are required? It only makes sense. > Usually I prefer to go the simplest way, and using locks is a lot more > straightforward and easier to understand than memory barriers. How so? Locks provide the same memory barriers implicitly. > Unless it's clear there's an observable performance penalty, keeping > it simple is better. Exactly. Here is where our opinions differ. For me 'simple' means without useless locking. I mean especially with locks they should only be used when absolutely necessary. In a sense of 'use the right tool for the job'. There are more lightweight tools possible in this context. Locks provide additional guarantees which are not needed here. On the other hand I understand that using a lock is a 'better safe than sorry' approach which should also work. Just keep in mind that spinlock may sleep on RT. > data_race() may be ok here, at least for one of the unprotected > accesses it's fine, but would have to analyse the other 3 cases. That's exactly my thoughts. Maybe not even the barriers are needed after all. This needs to be checked on a per-case basis. > > > > > And then use btrfs_bock_group_used() everywhere in discard.c where we > > > aren't holding a block group's lock. > > > > > > Thanks. > > > > > > > > > > add_to_discard_unused_list(discard_ctl, block_group); > > > > else > > > > add_to_discard_list(discard_ctl, block_group); > > > > -- > > > > 2.34.1 > > > > > > > > > > >
On Tue, Feb 18, 2025 at 4:14 PM Daniel Vacek <neelx@suse.com> wrote: > > On Tue, 18 Feb 2025 at 11:19, Filipe Manana <fdmanana@kernel.org> wrote: > > > > On Tue, Feb 18, 2025 at 8:08 AM Daniel Vacek <neelx@suse.com> wrote: > > > > > > On Mon, 10 Feb 2025 at 12:11, Filipe Manana <fdmanana@kernel.org> wrote: > > > > > > > > On Sat, Feb 8, 2025 at 7:38 AM Hao-ran Zheng <zhenghaoran154@gmail.com> wrote: > > > > > > > > > > A data race may occur when the function `btrfs_discard_queue_work()` > > > > > and the function `btrfs_update_block_group()` is executed concurrently. > > > > > Specifically, when the `btrfs_update_block_group()` function is executed > > > > > to lines `cache->used = old_val;`, and `btrfs_discard_queue_work()` > > > > > is accessing `if(block_group->used == 0)` will appear with data race, > > > > > which may cause block_group to be placed unexpectedly in discard_list or > > > > > discard_unused_list. The specific function call stack is as follows: > > > > > > > > > > ============DATA_RACE============ > > > > > btrfs_discard_queue_work+0x245/0x500 [btrfs] > > > > > __btrfs_add_free_space+0x3066/0x32f0 [btrfs] > > > > > btrfs_add_free_space+0x19a/0x200 [btrfs] > > > > > unpin_extent_range+0x847/0x2120 [btrfs] > > > > > btrfs_finish_extent_commit+0x9a3/0x1840 [btrfs] > > > > > btrfs_commit_transaction+0x5f65/0xc0f0 [btrfs] > > > > > transaction_kthread+0x764/0xc20 [btrfs] > > > > > kthread+0x292/0x330 > > > > > ret_from_fork+0x4d/0x80 > > > > > ret_from_fork_asm+0x1a/0x30 > > > > > ============OTHER_INFO============ > > > > > btrfs_update_block_group+0xa9d/0x2430 [btrfs] > > > > > __btrfs_free_extent+0x4f69/0x9920 [btrfs] > > > > > __btrfs_run_delayed_refs+0x290e/0xd7d0 [btrfs] > > > > > btrfs_run_delayed_refs+0x317/0x770 [btrfs] > > > > > flush_space+0x388/0x1440 [btrfs] > > > > > btrfs_preempt_reclaim_metadata_space+0xd65/0x14c0 [btrfs] > > > > > process_scheduled_works+0x716/0xf10 > > > > > worker_thread+0xb6a/0x1190 > > > > > kthread+0x292/0x330 > > > > > ret_from_fork+0x4d/0x80 > > > > > ret_from_fork_asm+0x1a/0x30 > > > > > ================================= > > > > > > > > > > Although the `block_group->used` was checked again in the use of the > > > > > `peek_discard_list` function, considering that `block_group->used` is > > > > > a 64-bit variable, we still think that the data race here is an > > > > > unexpected behavior. It is recommended to add `READ_ONCE` and > > > > > `WRITE_ONCE` to read and write. > > > > > > > > > > Signed-off-by: Hao-ran Zheng <zhenghaoran154@gmail.com> > > > > > --- > > > > > fs/btrfs/block-group.c | 4 ++-- > > > > > fs/btrfs/discard.c | 2 +- > > > > > 2 files changed, 3 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c > > > > > index c0a8f7d92acc..c681b97f6835 100644 > > > > > --- a/fs/btrfs/block-group.c > > > > > +++ b/fs/btrfs/block-group.c > > > > > @@ -3678,7 +3678,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, > > > > > old_val = cache->used; > > > > > if (alloc) { > > > > > old_val += num_bytes; > > > > > - cache->used = old_val; > > > > > + WRITE_ONCE(cache->used, old_val); > > > > > cache->reserved -= num_bytes; > > > > > cache->reclaim_mark = 0; > > > > > space_info->bytes_reserved -= num_bytes; > > > > > @@ -3690,7 +3690,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, > > > > > spin_unlock(&space_info->lock); > > > > > } else { > > > > > old_val -= num_bytes; > > > > > - cache->used = old_val; > > > > > + WRITE_ONCE(cache->used, old_val); > > > > > cache->pinned += num_bytes; > > > > > btrfs_space_info_update_bytes_pinned(space_info, num_bytes); > > > > > space_info->bytes_used -= num_bytes; > > > > > diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c > > > > > index e815d165cccc..71c57b571d50 100644 > > > > > --- a/fs/btrfs/discard.c > > > > > +++ b/fs/btrfs/discard.c > > > > > @@ -363,7 +363,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl, > > > > > if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC)) > > > > > return; > > > > > > > > > > - if (block_group->used == 0) > > > > > + if (READ_ONCE(block_group->used) == 0) > > > > > > > > There are at least 3 more places in discard.c where we access ->used > > > > without being under the protection of the block group's spinlock. > > > > So let's fix this for all places and not just a single one... > > > > > > > > Also, this is quite ugly to spread READ_ONCE/WRITE_ONCE all over the place. > > > > What we typically do in btrfs is to add helpers that hide them, see > > > > block-rsv.h for example. > > > > > > > > Also, I don't think we need READ_ONCE/WRITE_ONCE. > > > > We could use data_race(), though I think that could be subject to > > > > load/store tearing, or just take the lock. > > > > So adding a helper like this to block-group.h: > > > > > > > > static inline u64 btrfs_block_group_used(struct btrfs_block_group *bg) > > > > { > > > > u64 ret; > > > > > > > > spin_lock(&bg->lock); > > > > ret = bg->used; > > > > spin_unlock(&bg->lock); > > > > > > > > return ret; > > > > } > > > > > > Would memory barriers be sufficient here? Taking a lock just for > > > reading one member seems excessive... > > > > Do you think there's heavy contention on this lock? > > This is not about contention. Spin lock should just never be used this > way. Or any lock actually. The critical section only contains a single > fetch operation which does not justify using a lock. > Hence the only guarantee such lock usage provides are the implicit > memory barriers (from which maybe only one of them is really needed > depending on the context where this helper is going to be used). > > Simply put, the lock is degraded into a memory barrier this way. So > why not just use the barriers in the first place if only ordering > guarantees are required? It only makes sense. As said earlier, it's a lot easier to reason about lock() and unlock() calls rather than spreading memory barriers in the write and read sides. Historically he had several mistakes with barriers, they're simply not as straightforward to reason as locks. Plus spreading the barriers in the read and write sides makes the code not so easy to read, not to mention in case of any new sites updating the member, we'll have to not forget adding a barrier. It's just easier to reason and maintain. > > > Usually I prefer to go the simplest way, and using locks is a lot more > > straightforward and easier to understand than memory barriers. > > How so? Locks provide the same memory barriers implicitly. I'm not saying they don't. I'm talking about easier code to read and maintain. > > > Unless it's clear there's an observable performance penalty, keeping > > it simple is better. > > Exactly. Here is where our opinions differ. For me 'simple' means > without useless locking. > I mean especially with locks they should only be used when absolutely > necessary. In a sense of 'use the right tool for the job'. There are > more lightweight tools possible in this context. Locks provide > additional guarantees which are not needed here. > > On the other hand I understand that using a lock is a 'better safe > than sorry' approach which should also work. Just keep in mind that > spinlock may sleep on RT. It's not about a better safe than sorry, but easier to read, reason and maintain. > > > data_race() may be ok here, at least for one of the unprotected > > accesses it's fine, but would have to analyse the other 3 cases. > > That's exactly my thoughts. Maybe not even the barriers are needed > after all. This needs to be checked on a per-case basis. > > > > > > > > And then use btrfs_bock_group_used() everywhere in discard.c where we > > > > aren't holding a block group's lock. > > > > > > > > Thanks. > > > > > > > > > > > > > add_to_discard_unused_list(discard_ctl, block_group); > > > > > else > > > > > add_to_discard_list(discard_ctl, block_group); > > > > > -- > > > > > 2.34.1 > > > > > > > > > > > > > >
On Tue, 18 Feb 2025 at 17:29, Filipe Manana <fdmanana@kernel.org> wrote: > > On Tue, Feb 18, 2025 at 4:14 PM Daniel Vacek <neelx@suse.com> wrote: > > > > On Tue, 18 Feb 2025 at 11:19, Filipe Manana <fdmanana@kernel.org> wrote: > > > > > > On Tue, Feb 18, 2025 at 8:08 AM Daniel Vacek <neelx@suse.com> wrote: > > > > > > > > On Mon, 10 Feb 2025 at 12:11, Filipe Manana <fdmanana@kernel.org> wrote: > > > > > > > > > > On Sat, Feb 8, 2025 at 7:38 AM Hao-ran Zheng <zhenghaoran154@gmail.com> wrote: > > > > > > > > > > > > A data race may occur when the function `btrfs_discard_queue_work()` > > > > > > and the function `btrfs_update_block_group()` is executed concurrently. > > > > > > Specifically, when the `btrfs_update_block_group()` function is executed > > > > > > to lines `cache->used = old_val;`, and `btrfs_discard_queue_work()` > > > > > > is accessing `if(block_group->used == 0)` will appear with data race, > > > > > > which may cause block_group to be placed unexpectedly in discard_list or > > > > > > discard_unused_list. The specific function call stack is as follows: > > > > > > > > > > > > ============DATA_RACE============ > > > > > > btrfs_discard_queue_work+0x245/0x500 [btrfs] > > > > > > __btrfs_add_free_space+0x3066/0x32f0 [btrfs] > > > > > > btrfs_add_free_space+0x19a/0x200 [btrfs] > > > > > > unpin_extent_range+0x847/0x2120 [btrfs] > > > > > > btrfs_finish_extent_commit+0x9a3/0x1840 [btrfs] > > > > > > btrfs_commit_transaction+0x5f65/0xc0f0 [btrfs] > > > > > > transaction_kthread+0x764/0xc20 [btrfs] > > > > > > kthread+0x292/0x330 > > > > > > ret_from_fork+0x4d/0x80 > > > > > > ret_from_fork_asm+0x1a/0x30 > > > > > > ============OTHER_INFO============ > > > > > > btrfs_update_block_group+0xa9d/0x2430 [btrfs] > > > > > > __btrfs_free_extent+0x4f69/0x9920 [btrfs] > > > > > > __btrfs_run_delayed_refs+0x290e/0xd7d0 [btrfs] > > > > > > btrfs_run_delayed_refs+0x317/0x770 [btrfs] > > > > > > flush_space+0x388/0x1440 [btrfs] > > > > > > btrfs_preempt_reclaim_metadata_space+0xd65/0x14c0 [btrfs] > > > > > > process_scheduled_works+0x716/0xf10 > > > > > > worker_thread+0xb6a/0x1190 > > > > > > kthread+0x292/0x330 > > > > > > ret_from_fork+0x4d/0x80 > > > > > > ret_from_fork_asm+0x1a/0x30 > > > > > > ================================= > > > > > > > > > > > > Although the `block_group->used` was checked again in the use of the > > > > > > `peek_discard_list` function, considering that `block_group->used` is > > > > > > a 64-bit variable, we still think that the data race here is an > > > > > > unexpected behavior. It is recommended to add `READ_ONCE` and > > > > > > `WRITE_ONCE` to read and write. > > > > > > > > > > > > Signed-off-by: Hao-ran Zheng <zhenghaoran154@gmail.com> > > > > > > --- > > > > > > fs/btrfs/block-group.c | 4 ++-- > > > > > > fs/btrfs/discard.c | 2 +- > > > > > > 2 files changed, 3 insertions(+), 3 deletions(-) > > > > > > > > > > > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c > > > > > > index c0a8f7d92acc..c681b97f6835 100644 > > > > > > --- a/fs/btrfs/block-group.c > > > > > > +++ b/fs/btrfs/block-group.c > > > > > > @@ -3678,7 +3678,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, > > > > > > old_val = cache->used; > > > > > > if (alloc) { > > > > > > old_val += num_bytes; > > > > > > - cache->used = old_val; > > > > > > + WRITE_ONCE(cache->used, old_val); > > > > > > cache->reserved -= num_bytes; > > > > > > cache->reclaim_mark = 0; > > > > > > space_info->bytes_reserved -= num_bytes; > > > > > > @@ -3690,7 +3690,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, > > > > > > spin_unlock(&space_info->lock); > > > > > > } else { > > > > > > old_val -= num_bytes; > > > > > > - cache->used = old_val; > > > > > > + WRITE_ONCE(cache->used, old_val); > > > > > > cache->pinned += num_bytes; > > > > > > btrfs_space_info_update_bytes_pinned(space_info, num_bytes); > > > > > > space_info->bytes_used -= num_bytes; > > > > > > diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c > > > > > > index e815d165cccc..71c57b571d50 100644 > > > > > > --- a/fs/btrfs/discard.c > > > > > > +++ b/fs/btrfs/discard.c > > > > > > @@ -363,7 +363,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl, > > > > > > if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC)) > > > > > > return; > > > > > > > > > > > > - if (block_group->used == 0) > > > > > > + if (READ_ONCE(block_group->used) == 0) > > > > > > > > > > There are at least 3 more places in discard.c where we access ->used > > > > > without being under the protection of the block group's spinlock. > > > > > So let's fix this for all places and not just a single one... > > > > > > > > > > Also, this is quite ugly to spread READ_ONCE/WRITE_ONCE all over the place. > > > > > What we typically do in btrfs is to add helpers that hide them, see > > > > > block-rsv.h for example. > > > > > > > > > > Also, I don't think we need READ_ONCE/WRITE_ONCE. > > > > > We could use data_race(), though I think that could be subject to > > > > > load/store tearing, or just take the lock. > > > > > So adding a helper like this to block-group.h: > > > > > > > > > > static inline u64 btrfs_block_group_used(struct btrfs_block_group *bg) > > > > > { > > > > > u64 ret; > > > > > > > > > > spin_lock(&bg->lock); > > > > > ret = bg->used; > > > > > spin_unlock(&bg->lock); > > > > > > > > > > return ret; > > > > > } > > > > > > > > Would memory barriers be sufficient here? Taking a lock just for > > > > reading one member seems excessive... > > > > > > Do you think there's heavy contention on this lock? > > > > This is not about contention. Spin lock should just never be used this > > way. Or any lock actually. The critical section only contains a single > > fetch operation which does not justify using a lock. > > Hence the only guarantee such lock usage provides are the implicit > > memory barriers (from which maybe only one of them is really needed > > depending on the context where this helper is going to be used). > > > > Simply put, the lock is degraded into a memory barrier this way. So > > why not just use the barriers in the first place if only ordering > > guarantees are required? It only makes sense. > > As said earlier, it's a lot easier to reason about lock() and unlock() > calls rather than spreading memory barriers in the write and read > sides. > Historically he had several mistakes with barriers, they're simply not > as straightforward to reason as locks. > > Plus spreading the barriers in the read and write sides makes the code > not so easy to read, not to mention in case of any new sites updating > the member, we'll have to not forget adding a barrier. > > It's just easier to reason and maintain. > > > > > > > Usually I prefer to go the simplest way, and using locks is a lot more > > > straightforward and easier to understand than memory barriers. > > > > How so? Locks provide the same memory barriers implicitly. > > I'm not saying they don't. > I'm talking about easier code to read and maintain. > > > > > > Unless it's clear there's an observable performance penalty, keeping > > > it simple is better. > > > > Exactly. Here is where our opinions differ. For me 'simple' means > > without useless locking. > > I mean especially with locks they should only be used when absolutely > > necessary. In a sense of 'use the right tool for the job'. There are > > more lightweight tools possible in this context. Locks provide > > additional guarantees which are not needed here. > > > > On the other hand I understand that using a lock is a 'better safe > > than sorry' approach which should also work. Just keep in mind that > > spinlock may sleep on RT. > > It's not about a better safe than sorry, but easier to read, reason > and maintain. I see. Understood. I guess I'm just too micro-optimizing where it's not really needed and only adds complexity in a way. > > > > > data_race() may be ok here, at least for one of the unprotected > > > accesses it's fine, but would have to analyse the other 3 cases. > > > > That's exactly my thoughts. Maybe not even the barriers are needed > > after all. This needs to be checked on a per-case basis. > > > > > > > > > > > And then use btrfs_bock_group_used() everywhere in discard.c where we > > > > > aren't holding a block group's lock. > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > add_to_discard_unused_list(discard_ctl, block_group); > > > > > > else > > > > > > add_to_discard_list(discard_ctl, block_group); > > > > > > -- > > > > > > 2.34.1 > > > > > > > > > > > > > > > > >
On Wed, Feb 19, 2025 at 12:29 AM Filipe Manana <fdmanana@kernel.org> wrote: > > On Tue, Feb 18, 2025 at 4:14 PM Daniel Vacek <neelx@suse.com> wrote: > > > > On Tue, 18 Feb 2025 at 11:19, Filipe Manana <fdmanana@kernel.org> wrote: > > > > > > On Tue, Feb 18, 2025 at 8:08 AM Daniel Vacek <neelx@suse.com> wrote: > > > > > > > > On Mon, 10 Feb 2025 at 12:11, Filipe Manana <fdmanana@kernel.org> wrote: > > > > > > > > > > On Sat, Feb 8, 2025 at 7:38 AM Hao-ran Zheng <zhenghaoran154@gmail.com> wrote: > > > > > > > > > > > > A data race may occur when the function `btrfs_discard_queue_work()` > > > > > > and the function `btrfs_update_block_group()` is executed concurrently. > > > > > > Specifically, when the `btrfs_update_block_group()` function is executed > > > > > > to lines `cache->used = old_val;`, and `btrfs_discard_queue_work()` > > > > > > is accessing `if(block_group->used == 0)` will appear with data race, > > > > > > which may cause block_group to be placed unexpectedly in discard_list or > > > > > > discard_unused_list. The specific function call stack is as follows: > > > > > > > > > > > > ============DATA_RACE============ > > > > > > btrfs_discard_queue_work+0x245/0x500 [btrfs] > > > > > > __btrfs_add_free_space+0x3066/0x32f0 [btrfs] > > > > > > btrfs_add_free_space+0x19a/0x200 [btrfs] > > > > > > unpin_extent_range+0x847/0x2120 [btrfs] > > > > > > btrfs_finish_extent_commit+0x9a3/0x1840 [btrfs] > > > > > > btrfs_commit_transaction+0x5f65/0xc0f0 [btrfs] > > > > > > transaction_kthread+0x764/0xc20 [btrfs] > > > > > > kthread+0x292/0x330 > > > > > > ret_from_fork+0x4d/0x80 > > > > > > ret_from_fork_asm+0x1a/0x30 > > > > > > ============OTHER_INFO============ > > > > > > btrfs_update_block_group+0xa9d/0x2430 [btrfs] > > > > > > __btrfs_free_extent+0x4f69/0x9920 [btrfs] > > > > > > __btrfs_run_delayed_refs+0x290e/0xd7d0 [btrfs] > > > > > > btrfs_run_delayed_refs+0x317/0x770 [btrfs] > > > > > > flush_space+0x388/0x1440 [btrfs] > > > > > > btrfs_preempt_reclaim_metadata_space+0xd65/0x14c0 [btrfs] > > > > > > process_scheduled_works+0x716/0xf10 > > > > > > worker_thread+0xb6a/0x1190 > > > > > > kthread+0x292/0x330 > > > > > > ret_from_fork+0x4d/0x80 > > > > > > ret_from_fork_asm+0x1a/0x30 > > > > > > ================================= > > > > > > > > > > > > Although the `block_group->used` was checked again in the use of the > > > > > > `peek_discard_list` function, considering that `block_group->used` is > > > > > > a 64-bit variable, we still think that the data race here is an > > > > > > unexpected behavior. It is recommended to add `READ_ONCE` and > > > > > > `WRITE_ONCE` to read and write. > > > > > > > > > > > > Signed-off-by: Hao-ran Zheng <zhenghaoran154@gmail.com> > > > > > > --- > > > > > > fs/btrfs/block-group.c | 4 ++-- > > > > > > fs/btrfs/discard.c | 2 +- > > > > > > 2 files changed, 3 insertions(+), 3 deletions(-) > > > > > > > > > > > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c > > > > > > index c0a8f7d92acc..c681b97f6835 100644 > > > > > > --- a/fs/btrfs/block-group.c > > > > > > +++ b/fs/btrfs/block-group.c > > > > > > @@ -3678,7 +3678,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, > > > > > > old_val = cache->used; > > > > > > if (alloc) { > > > > > > old_val += num_bytes; > > > > > > - cache->used = old_val; > > > > > > + WRITE_ONCE(cache->used, old_val); > > > > > > cache->reserved -= num_bytes; > > > > > > cache->reclaim_mark = 0; > > > > > > space_info->bytes_reserved -= num_bytes; > > > > > > @@ -3690,7 +3690,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, > > > > > > spin_unlock(&space_info->lock); > > > > > > } else { > > > > > > old_val -= num_bytes; > > > > > > - cache->used = old_val; > > > > > > + WRITE_ONCE(cache->used, old_val); > > > > > > cache->pinned += num_bytes; > > > > > > btrfs_space_info_update_bytes_pinned(space_info, num_bytes); > > > > > > space_info->bytes_used -= num_bytes; > > > > > > diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c > > > > > > index e815d165cccc..71c57b571d50 100644 > > > > > > --- a/fs/btrfs/discard.c > > > > > > +++ b/fs/btrfs/discard.c > > > > > > @@ -363,7 +363,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl, > > > > > > if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC)) > > > > > > return; > > > > > > > > > > > > - if (block_group->used == 0) > > > > > > + if (READ_ONCE(block_group->used) == 0) > > > > > > > > > > There are at least 3 more places in discard.c where we access ->used > > > > > without being under the protection of the block group's spinlock. > > > > > So let's fix this for all places and not just a single one... > > > > > > > > > > Also, this is quite ugly to spread READ_ONCE/WRITE_ONCE all over the place. > > > > > What we typically do in btrfs is to add helpers that hide them, see > > > > > block-rsv.h for example. > > > > > > > > > > Also, I don't think we need READ_ONCE/WRITE_ONCE. > > > > > We could use data_race(), though I think that could be subject to > > > > > load/store tearing, or just take the lock. > > > > > So adding a helper like this to block-group.h: > > > > > > > > > > static inline u64 btrfs_block_group_used(struct btrfs_block_group *bg) > > > > > { > > > > > u64 ret; > > > > > > > > > > spin_lock(&bg->lock); > > > > > ret = bg->used; > > > > > spin_unlock(&bg->lock); > > > > > > > > > > return ret; > > > > > } I understand that using lock to protect block_group->used in discard.c file is feasible. In addition, I looked at the code of block-group.c and found that locks have been added in some places where block_group->used are used. , it seems that it is not appropriate to call btrfs_block_group_used again to obtain (because it will cause deadlock). I would like to ask other similar places where block_group->used needs to call btrfs_block_group_used function in addition to the four places in discard.c? > > > > > > > > Would memory barriers be sufficient here? Taking a lock just for > > > > reading one member seems excessive... Do I need to perform performance testing to ensure the impact if I lock it? > > > > > > Do you think there's heavy contention on this lock? > > > > This is not about contention. Spin lock should just never be used this > > way. Or any lock actually. The critical section only contains a single > > fetch operation which does not justify using a lock. > > Hence the only guarantee such lock usage provides are the implicit > > memory barriers (from which maybe only one of them is really needed > > depending on the context where this helper is going to be used). > > > > Simply put, the lock is degraded into a memory barrier this way. So > > why not just use the barriers in the first place if only ordering > > guarantees are required? It only makes sense. > > As said earlier, it's a lot easier to reason about lock() and unlock() > calls rather than spreading memory barriers in the write and read > sides. > Historically he had several mistakes with barriers, they're simply not > as straightforward to reason as locks. > > Plus spreading the barriers in the read and write sides makes the code > not so easy to read, not to mention in case of any new sites updating > the member, we'll have to not forget adding a barrier. > > It's just easier to reason and maintain. > > > > > > > Usually I prefer to go the simplest way, and using locks is a lot more > > > straightforward and easier to understand than memory barriers. > > > > How so? Locks provide the same memory barriers implicitly. > > I'm not saying they don't. > I'm talking about easier code to read and maintain. > > > > > > Unless it's clear there's an observable performance penalty, keeping > > > it simple is better. > > > > Exactly. Here is where our opinions differ. For me 'simple' means > > without useless locking. > > I mean especially with locks they should only be used when absolutely > > necessary. In a sense of 'use the right tool for the job'. There are > > more lightweight tools possible in this context. Locks provide > > additional guarantees which are not needed here. > > > > On the other hand I understand that using a lock is a 'better safe > > than sorry' approach which should also work. Just keep in mind that > > spinlock may sleep on RT. > > It's not about a better safe than sorry, but easier to read, reason > and maintain. > > > > > > data_race() may be ok here, at least for one of the unprotected > > > accesses it's fine, but would have to analyse the other 3 cases. > > > > That's exactly my thoughts. Maybe not even the barriers are needed > > after all. This needs to be checked on a per-case basis. > > > > > > > > > > > And then use btrfs_bock_group_used() everywhere in discard.c where we > > > > > aren't holding a block group's lock. > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > add_to_discard_unused_list(discard_ctl, block_group); > > > > > > else > > > > > > add_to_discard_list(discard_ctl, block_group); > > > > > > -- > > > > > > 2.34.1 > > > > > > > > > > > > > > > > >
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index c0a8f7d92acc..c681b97f6835 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -3678,7 +3678,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, old_val = cache->used; if (alloc) { old_val += num_bytes; - cache->used = old_val; + WRITE_ONCE(cache->used, old_val); cache->reserved -= num_bytes; cache->reclaim_mark = 0; space_info->bytes_reserved -= num_bytes; @@ -3690,7 +3690,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans, spin_unlock(&space_info->lock); } else { old_val -= num_bytes; - cache->used = old_val; + WRITE_ONCE(cache->used, old_val); cache->pinned += num_bytes; btrfs_space_info_update_bytes_pinned(space_info, num_bytes); space_info->bytes_used -= num_bytes; diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c index e815d165cccc..71c57b571d50 100644 --- a/fs/btrfs/discard.c +++ b/fs/btrfs/discard.c @@ -363,7 +363,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl, if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC)) return; - if (block_group->used == 0) + if (READ_ONCE(block_group->used) == 0) add_to_discard_unused_list(discard_ctl, block_group); else add_to_discard_list(discard_ctl, block_group);
A data race may occur when the function `btrfs_discard_queue_work()` and the function `btrfs_update_block_group()` is executed concurrently. Specifically, when the `btrfs_update_block_group()` function is executed to lines `cache->used = old_val;`, and `btrfs_discard_queue_work()` is accessing `if(block_group->used == 0)` will appear with data race, which may cause block_group to be placed unexpectedly in discard_list or discard_unused_list. The specific function call stack is as follows: ============DATA_RACE============ btrfs_discard_queue_work+0x245/0x500 [btrfs] __btrfs_add_free_space+0x3066/0x32f0 [btrfs] btrfs_add_free_space+0x19a/0x200 [btrfs] unpin_extent_range+0x847/0x2120 [btrfs] btrfs_finish_extent_commit+0x9a3/0x1840 [btrfs] btrfs_commit_transaction+0x5f65/0xc0f0 [btrfs] transaction_kthread+0x764/0xc20 [btrfs] kthread+0x292/0x330 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 ============OTHER_INFO============ btrfs_update_block_group+0xa9d/0x2430 [btrfs] __btrfs_free_extent+0x4f69/0x9920 [btrfs] __btrfs_run_delayed_refs+0x290e/0xd7d0 [btrfs] btrfs_run_delayed_refs+0x317/0x770 [btrfs] flush_space+0x388/0x1440 [btrfs] btrfs_preempt_reclaim_metadata_space+0xd65/0x14c0 [btrfs] process_scheduled_works+0x716/0xf10 worker_thread+0xb6a/0x1190 kthread+0x292/0x330 ret_from_fork+0x4d/0x80 ret_from_fork_asm+0x1a/0x30 ================================= Although the `block_group->used` was checked again in the use of the `peek_discard_list` function, considering that `block_group->used` is a 64-bit variable, we still think that the data race here is an unexpected behavior. It is recommended to add `READ_ONCE` and `WRITE_ONCE` to read and write. Signed-off-by: Hao-ran Zheng <zhenghaoran154@gmail.com> --- fs/btrfs/block-group.c | 4 ++-- fs/btrfs/discard.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-)