diff mbox series

btrfs: fix data race when accessing the block_group's used field

Message ID 20250208073829.1176287-1-zhenghaoran154@gmail.com (mailing list archive)
State New
Headers show
Series btrfs: fix data race when accessing the block_group's used field | expand

Commit Message

haoran zheng Feb. 8, 2025, 7:38 a.m. UTC
A data race may occur when the function `btrfs_discard_queue_work()`
and the function `btrfs_update_block_group()` is executed concurrently.
Specifically, when the `btrfs_update_block_group()` function is executed
to lines `cache->used = old_val;`, and `btrfs_discard_queue_work()`
is accessing `if(block_group->used == 0)` will appear with data race,
which may cause block_group to be placed unexpectedly in discard_list or
discard_unused_list. The specific function call stack is as follows:

============DATA_RACE============
 btrfs_discard_queue_work+0x245/0x500 [btrfs]
 __btrfs_add_free_space+0x3066/0x32f0 [btrfs]
 btrfs_add_free_space+0x19a/0x200 [btrfs]
 unpin_extent_range+0x847/0x2120 [btrfs]
 btrfs_finish_extent_commit+0x9a3/0x1840 [btrfs]
 btrfs_commit_transaction+0x5f65/0xc0f0 [btrfs]
 transaction_kthread+0x764/0xc20 [btrfs]
 kthread+0x292/0x330
 ret_from_fork+0x4d/0x80
 ret_from_fork_asm+0x1a/0x30
============OTHER_INFO============
 btrfs_update_block_group+0xa9d/0x2430 [btrfs]
 __btrfs_free_extent+0x4f69/0x9920 [btrfs]
 __btrfs_run_delayed_refs+0x290e/0xd7d0 [btrfs]
 btrfs_run_delayed_refs+0x317/0x770 [btrfs]
 flush_space+0x388/0x1440 [btrfs]
 btrfs_preempt_reclaim_metadata_space+0xd65/0x14c0 [btrfs]
 process_scheduled_works+0x716/0xf10
 worker_thread+0xb6a/0x1190
 kthread+0x292/0x330
 ret_from_fork+0x4d/0x80
 ret_from_fork_asm+0x1a/0x30
=================================

Although the `block_group->used` was checked again in the use of the
`peek_discard_list` function, considering that `block_group->used` is
a 64-bit variable, we still think that the data race here is an
unexpected behavior. It is recommended to add `READ_ONCE` and
`WRITE_ONCE` to read and write.

Signed-off-by: Hao-ran Zheng <zhenghaoran154@gmail.com>
---
 fs/btrfs/block-group.c | 4 ++--
 fs/btrfs/discard.c     | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

Comments

Filipe Manana Feb. 10, 2025, 11:09 a.m. UTC | #1
On Sat, Feb 8, 2025 at 7:38 AM Hao-ran Zheng <zhenghaoran154@gmail.com> wrote:
>
> A data race may occur when the function `btrfs_discard_queue_work()`
> and the function `btrfs_update_block_group()` is executed concurrently.
> Specifically, when the `btrfs_update_block_group()` function is executed
> to lines `cache->used = old_val;`, and `btrfs_discard_queue_work()`
> is accessing `if(block_group->used == 0)` will appear with data race,
> which may cause block_group to be placed unexpectedly in discard_list or
> discard_unused_list. The specific function call stack is as follows:
>
> ============DATA_RACE============
>  btrfs_discard_queue_work+0x245/0x500 [btrfs]
>  __btrfs_add_free_space+0x3066/0x32f0 [btrfs]
>  btrfs_add_free_space+0x19a/0x200 [btrfs]
>  unpin_extent_range+0x847/0x2120 [btrfs]
>  btrfs_finish_extent_commit+0x9a3/0x1840 [btrfs]
>  btrfs_commit_transaction+0x5f65/0xc0f0 [btrfs]
>  transaction_kthread+0x764/0xc20 [btrfs]
>  kthread+0x292/0x330
>  ret_from_fork+0x4d/0x80
>  ret_from_fork_asm+0x1a/0x30
> ============OTHER_INFO============
>  btrfs_update_block_group+0xa9d/0x2430 [btrfs]
>  __btrfs_free_extent+0x4f69/0x9920 [btrfs]
>  __btrfs_run_delayed_refs+0x290e/0xd7d0 [btrfs]
>  btrfs_run_delayed_refs+0x317/0x770 [btrfs]
>  flush_space+0x388/0x1440 [btrfs]
>  btrfs_preempt_reclaim_metadata_space+0xd65/0x14c0 [btrfs]
>  process_scheduled_works+0x716/0xf10
>  worker_thread+0xb6a/0x1190
>  kthread+0x292/0x330
>  ret_from_fork+0x4d/0x80
>  ret_from_fork_asm+0x1a/0x30
> =================================
>
> Although the `block_group->used` was checked again in the use of the
> `peek_discard_list` function, considering that `block_group->used` is
> a 64-bit variable, we still think that the data race here is an
> unexpected behavior. It is recommended to add `READ_ONCE` and
> `WRITE_ONCE` to read and write.
>
> Signed-off-by: Hao-ran Zheng <zhenghaoran154@gmail.com>
> ---
>  fs/btrfs/block-group.c | 4 ++--
>  fs/btrfs/discard.c     | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index c0a8f7d92acc..c681b97f6835 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -3678,7 +3678,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
>         old_val = cache->used;
>         if (alloc) {
>                 old_val += num_bytes;
> -               cache->used = old_val;
> +               WRITE_ONCE(cache->used, old_val);
>                 cache->reserved -= num_bytes;
>                 cache->reclaim_mark = 0;
>                 space_info->bytes_reserved -= num_bytes;
> @@ -3690,7 +3690,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
>                 spin_unlock(&space_info->lock);
>         } else {
>                 old_val -= num_bytes;
> -               cache->used = old_val;
> +               WRITE_ONCE(cache->used, old_val);
>                 cache->pinned += num_bytes;
>                 btrfs_space_info_update_bytes_pinned(space_info, num_bytes);
>                 space_info->bytes_used -= num_bytes;
> diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
> index e815d165cccc..71c57b571d50 100644
> --- a/fs/btrfs/discard.c
> +++ b/fs/btrfs/discard.c
> @@ -363,7 +363,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl,
>         if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC))
>                 return;
>
> -       if (block_group->used == 0)
> +       if (READ_ONCE(block_group->used) == 0)

There are at least 3 more places in discard.c where we access ->used
without being under the protection of the block group's spinlock.
So let's fix this for all places and not just a single one...

Also, this is quite ugly to spread READ_ONCE/WRITE_ONCE all over the place.
What we typically do in btrfs is to add helpers that hide them, see
block-rsv.h for example.

Also, I don't think we need READ_ONCE/WRITE_ONCE.
We could use data_race(), though I think that could be subject to
load/store tearing, or just take the lock.
So adding a helper like this to block-group.h:

static inline u64 btrfs_block_group_used(struct btrfs_block_group *bg)
{
   u64 ret;

   spin_lock(&bg->lock);
   ret = bg->used;
   spin_unlock(&bg->lock);

    return ret;
}

And then use btrfs_bock_group_used() everywhere in discard.c where we
aren't holding a block group's lock.

Thanks.


>                 add_to_discard_unused_list(discard_ctl, block_group);
>         else
>                 add_to_discard_list(discard_ctl, block_group);
> --
> 2.34.1
>
>
Daniel Vacek Feb. 18, 2025, 8:08 a.m. UTC | #2
On Mon, 10 Feb 2025 at 12:11, Filipe Manana <fdmanana@kernel.org> wrote:
>
> On Sat, Feb 8, 2025 at 7:38 AM Hao-ran Zheng <zhenghaoran154@gmail.com> wrote:
> >
> > A data race may occur when the function `btrfs_discard_queue_work()`
> > and the function `btrfs_update_block_group()` is executed concurrently.
> > Specifically, when the `btrfs_update_block_group()` function is executed
> > to lines `cache->used = old_val;`, and `btrfs_discard_queue_work()`
> > is accessing `if(block_group->used == 0)` will appear with data race,
> > which may cause block_group to be placed unexpectedly in discard_list or
> > discard_unused_list. The specific function call stack is as follows:
> >
> > ============DATA_RACE============
> >  btrfs_discard_queue_work+0x245/0x500 [btrfs]
> >  __btrfs_add_free_space+0x3066/0x32f0 [btrfs]
> >  btrfs_add_free_space+0x19a/0x200 [btrfs]
> >  unpin_extent_range+0x847/0x2120 [btrfs]
> >  btrfs_finish_extent_commit+0x9a3/0x1840 [btrfs]
> >  btrfs_commit_transaction+0x5f65/0xc0f0 [btrfs]
> >  transaction_kthread+0x764/0xc20 [btrfs]
> >  kthread+0x292/0x330
> >  ret_from_fork+0x4d/0x80
> >  ret_from_fork_asm+0x1a/0x30
> > ============OTHER_INFO============
> >  btrfs_update_block_group+0xa9d/0x2430 [btrfs]
> >  __btrfs_free_extent+0x4f69/0x9920 [btrfs]
> >  __btrfs_run_delayed_refs+0x290e/0xd7d0 [btrfs]
> >  btrfs_run_delayed_refs+0x317/0x770 [btrfs]
> >  flush_space+0x388/0x1440 [btrfs]
> >  btrfs_preempt_reclaim_metadata_space+0xd65/0x14c0 [btrfs]
> >  process_scheduled_works+0x716/0xf10
> >  worker_thread+0xb6a/0x1190
> >  kthread+0x292/0x330
> >  ret_from_fork+0x4d/0x80
> >  ret_from_fork_asm+0x1a/0x30
> > =================================
> >
> > Although the `block_group->used` was checked again in the use of the
> > `peek_discard_list` function, considering that `block_group->used` is
> > a 64-bit variable, we still think that the data race here is an
> > unexpected behavior. It is recommended to add `READ_ONCE` and
> > `WRITE_ONCE` to read and write.
> >
> > Signed-off-by: Hao-ran Zheng <zhenghaoran154@gmail.com>
> > ---
> >  fs/btrfs/block-group.c | 4 ++--
> >  fs/btrfs/discard.c     | 2 +-
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> > index c0a8f7d92acc..c681b97f6835 100644
> > --- a/fs/btrfs/block-group.c
> > +++ b/fs/btrfs/block-group.c
> > @@ -3678,7 +3678,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
> >         old_val = cache->used;
> >         if (alloc) {
> >                 old_val += num_bytes;
> > -               cache->used = old_val;
> > +               WRITE_ONCE(cache->used, old_val);
> >                 cache->reserved -= num_bytes;
> >                 cache->reclaim_mark = 0;
> >                 space_info->bytes_reserved -= num_bytes;
> > @@ -3690,7 +3690,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
> >                 spin_unlock(&space_info->lock);
> >         } else {
> >                 old_val -= num_bytes;
> > -               cache->used = old_val;
> > +               WRITE_ONCE(cache->used, old_val);
> >                 cache->pinned += num_bytes;
> >                 btrfs_space_info_update_bytes_pinned(space_info, num_bytes);
> >                 space_info->bytes_used -= num_bytes;
> > diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
> > index e815d165cccc..71c57b571d50 100644
> > --- a/fs/btrfs/discard.c
> > +++ b/fs/btrfs/discard.c
> > @@ -363,7 +363,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl,
> >         if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC))
> >                 return;
> >
> > -       if (block_group->used == 0)
> > +       if (READ_ONCE(block_group->used) == 0)
>
> There are at least 3 more places in discard.c where we access ->used
> without being under the protection of the block group's spinlock.
> So let's fix this for all places and not just a single one...
>
> Also, this is quite ugly to spread READ_ONCE/WRITE_ONCE all over the place.
> What we typically do in btrfs is to add helpers that hide them, see
> block-rsv.h for example.
>
> Also, I don't think we need READ_ONCE/WRITE_ONCE.
> We could use data_race(), though I think that could be subject to
> load/store tearing, or just take the lock.
> So adding a helper like this to block-group.h:
>
> static inline u64 btrfs_block_group_used(struct btrfs_block_group *bg)
> {
>    u64 ret;
>
>    spin_lock(&bg->lock);
>    ret = bg->used;
>    spin_unlock(&bg->lock);
>
>     return ret;
> }

Would memory barriers be sufficient here? Taking a lock just for
reading one member seems excessive...

> And then use btrfs_bock_group_used() everywhere in discard.c where we
> aren't holding a block group's lock.
>
> Thanks.
>
>
> >                 add_to_discard_unused_list(discard_ctl, block_group);
> >         else
> >                 add_to_discard_list(discard_ctl, block_group);
> > --
> > 2.34.1
> >
> >
>
Filipe Manana Feb. 18, 2025, 10:18 a.m. UTC | #3
On Tue, Feb 18, 2025 at 8:08 AM Daniel Vacek <neelx@suse.com> wrote:
>
> On Mon, 10 Feb 2025 at 12:11, Filipe Manana <fdmanana@kernel.org> wrote:
> >
> > On Sat, Feb 8, 2025 at 7:38 AM Hao-ran Zheng <zhenghaoran154@gmail.com> wrote:
> > >
> > > A data race may occur when the function `btrfs_discard_queue_work()`
> > > and the function `btrfs_update_block_group()` is executed concurrently.
> > > Specifically, when the `btrfs_update_block_group()` function is executed
> > > to lines `cache->used = old_val;`, and `btrfs_discard_queue_work()`
> > > is accessing `if(block_group->used == 0)` will appear with data race,
> > > which may cause block_group to be placed unexpectedly in discard_list or
> > > discard_unused_list. The specific function call stack is as follows:
> > >
> > > ============DATA_RACE============
> > >  btrfs_discard_queue_work+0x245/0x500 [btrfs]
> > >  __btrfs_add_free_space+0x3066/0x32f0 [btrfs]
> > >  btrfs_add_free_space+0x19a/0x200 [btrfs]
> > >  unpin_extent_range+0x847/0x2120 [btrfs]
> > >  btrfs_finish_extent_commit+0x9a3/0x1840 [btrfs]
> > >  btrfs_commit_transaction+0x5f65/0xc0f0 [btrfs]
> > >  transaction_kthread+0x764/0xc20 [btrfs]
> > >  kthread+0x292/0x330
> > >  ret_from_fork+0x4d/0x80
> > >  ret_from_fork_asm+0x1a/0x30
> > > ============OTHER_INFO============
> > >  btrfs_update_block_group+0xa9d/0x2430 [btrfs]
> > >  __btrfs_free_extent+0x4f69/0x9920 [btrfs]
> > >  __btrfs_run_delayed_refs+0x290e/0xd7d0 [btrfs]
> > >  btrfs_run_delayed_refs+0x317/0x770 [btrfs]
> > >  flush_space+0x388/0x1440 [btrfs]
> > >  btrfs_preempt_reclaim_metadata_space+0xd65/0x14c0 [btrfs]
> > >  process_scheduled_works+0x716/0xf10
> > >  worker_thread+0xb6a/0x1190
> > >  kthread+0x292/0x330
> > >  ret_from_fork+0x4d/0x80
> > >  ret_from_fork_asm+0x1a/0x30
> > > =================================
> > >
> > > Although the `block_group->used` was checked again in the use of the
> > > `peek_discard_list` function, considering that `block_group->used` is
> > > a 64-bit variable, we still think that the data race here is an
> > > unexpected behavior. It is recommended to add `READ_ONCE` and
> > > `WRITE_ONCE` to read and write.
> > >
> > > Signed-off-by: Hao-ran Zheng <zhenghaoran154@gmail.com>
> > > ---
> > >  fs/btrfs/block-group.c | 4 ++--
> > >  fs/btrfs/discard.c     | 2 +-
> > >  2 files changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> > > index c0a8f7d92acc..c681b97f6835 100644
> > > --- a/fs/btrfs/block-group.c
> > > +++ b/fs/btrfs/block-group.c
> > > @@ -3678,7 +3678,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
> > >         old_val = cache->used;
> > >         if (alloc) {
> > >                 old_val += num_bytes;
> > > -               cache->used = old_val;
> > > +               WRITE_ONCE(cache->used, old_val);
> > >                 cache->reserved -= num_bytes;
> > >                 cache->reclaim_mark = 0;
> > >                 space_info->bytes_reserved -= num_bytes;
> > > @@ -3690,7 +3690,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
> > >                 spin_unlock(&space_info->lock);
> > >         } else {
> > >                 old_val -= num_bytes;
> > > -               cache->used = old_val;
> > > +               WRITE_ONCE(cache->used, old_val);
> > >                 cache->pinned += num_bytes;
> > >                 btrfs_space_info_update_bytes_pinned(space_info, num_bytes);
> > >                 space_info->bytes_used -= num_bytes;
> > > diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
> > > index e815d165cccc..71c57b571d50 100644
> > > --- a/fs/btrfs/discard.c
> > > +++ b/fs/btrfs/discard.c
> > > @@ -363,7 +363,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl,
> > >         if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC))
> > >                 return;
> > >
> > > -       if (block_group->used == 0)
> > > +       if (READ_ONCE(block_group->used) == 0)
> >
> > There are at least 3 more places in discard.c where we access ->used
> > without being under the protection of the block group's spinlock.
> > So let's fix this for all places and not just a single one...
> >
> > Also, this is quite ugly to spread READ_ONCE/WRITE_ONCE all over the place.
> > What we typically do in btrfs is to add helpers that hide them, see
> > block-rsv.h for example.
> >
> > Also, I don't think we need READ_ONCE/WRITE_ONCE.
> > We could use data_race(), though I think that could be subject to
> > load/store tearing, or just take the lock.
> > So adding a helper like this to block-group.h:
> >
> > static inline u64 btrfs_block_group_used(struct btrfs_block_group *bg)
> > {
> >    u64 ret;
> >
> >    spin_lock(&bg->lock);
> >    ret = bg->used;
> >    spin_unlock(&bg->lock);
> >
> >     return ret;
> > }
>
> Would memory barriers be sufficient here? Taking a lock just for
> reading one member seems excessive...

Do you think there's heavy contention on this lock?

Usually I prefer to go the simplest way, and using locks is a lot more
straightforward and easier to understand than memory barriers.
Unless it's clear there's an observable performance penalty, keeping
it simple is better.

data_race() may be ok here, at least for one of the unprotected
accesses it's fine, but would have to analyse the other 3 cases.



>
> > And then use btrfs_bock_group_used() everywhere in discard.c where we
> > aren't holding a block group's lock.
> >
> > Thanks.
> >
> >
> > >                 add_to_discard_unused_list(discard_ctl, block_group);
> > >         else
> > >                 add_to_discard_list(discard_ctl, block_group);
> > > --
> > > 2.34.1
> > >
> > >
> >
Daniel Vacek Feb. 18, 2025, 4:13 p.m. UTC | #4
On Tue, 18 Feb 2025 at 11:19, Filipe Manana <fdmanana@kernel.org> wrote:
>
> On Tue, Feb 18, 2025 at 8:08 AM Daniel Vacek <neelx@suse.com> wrote:
> >
> > On Mon, 10 Feb 2025 at 12:11, Filipe Manana <fdmanana@kernel.org> wrote:
> > >
> > > On Sat, Feb 8, 2025 at 7:38 AM Hao-ran Zheng <zhenghaoran154@gmail.com> wrote:
> > > >
> > > > A data race may occur when the function `btrfs_discard_queue_work()`
> > > > and the function `btrfs_update_block_group()` is executed concurrently.
> > > > Specifically, when the `btrfs_update_block_group()` function is executed
> > > > to lines `cache->used = old_val;`, and `btrfs_discard_queue_work()`
> > > > is accessing `if(block_group->used == 0)` will appear with data race,
> > > > which may cause block_group to be placed unexpectedly in discard_list or
> > > > discard_unused_list. The specific function call stack is as follows:
> > > >
> > > > ============DATA_RACE============
> > > >  btrfs_discard_queue_work+0x245/0x500 [btrfs]
> > > >  __btrfs_add_free_space+0x3066/0x32f0 [btrfs]
> > > >  btrfs_add_free_space+0x19a/0x200 [btrfs]
> > > >  unpin_extent_range+0x847/0x2120 [btrfs]
> > > >  btrfs_finish_extent_commit+0x9a3/0x1840 [btrfs]
> > > >  btrfs_commit_transaction+0x5f65/0xc0f0 [btrfs]
> > > >  transaction_kthread+0x764/0xc20 [btrfs]
> > > >  kthread+0x292/0x330
> > > >  ret_from_fork+0x4d/0x80
> > > >  ret_from_fork_asm+0x1a/0x30
> > > > ============OTHER_INFO============
> > > >  btrfs_update_block_group+0xa9d/0x2430 [btrfs]
> > > >  __btrfs_free_extent+0x4f69/0x9920 [btrfs]
> > > >  __btrfs_run_delayed_refs+0x290e/0xd7d0 [btrfs]
> > > >  btrfs_run_delayed_refs+0x317/0x770 [btrfs]
> > > >  flush_space+0x388/0x1440 [btrfs]
> > > >  btrfs_preempt_reclaim_metadata_space+0xd65/0x14c0 [btrfs]
> > > >  process_scheduled_works+0x716/0xf10
> > > >  worker_thread+0xb6a/0x1190
> > > >  kthread+0x292/0x330
> > > >  ret_from_fork+0x4d/0x80
> > > >  ret_from_fork_asm+0x1a/0x30
> > > > =================================
> > > >
> > > > Although the `block_group->used` was checked again in the use of the
> > > > `peek_discard_list` function, considering that `block_group->used` is
> > > > a 64-bit variable, we still think that the data race here is an
> > > > unexpected behavior. It is recommended to add `READ_ONCE` and
> > > > `WRITE_ONCE` to read and write.
> > > >
> > > > Signed-off-by: Hao-ran Zheng <zhenghaoran154@gmail.com>
> > > > ---
> > > >  fs/btrfs/block-group.c | 4 ++--
> > > >  fs/btrfs/discard.c     | 2 +-
> > > >  2 files changed, 3 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> > > > index c0a8f7d92acc..c681b97f6835 100644
> > > > --- a/fs/btrfs/block-group.c
> > > > +++ b/fs/btrfs/block-group.c
> > > > @@ -3678,7 +3678,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
> > > >         old_val = cache->used;
> > > >         if (alloc) {
> > > >                 old_val += num_bytes;
> > > > -               cache->used = old_val;
> > > > +               WRITE_ONCE(cache->used, old_val);
> > > >                 cache->reserved -= num_bytes;
> > > >                 cache->reclaim_mark = 0;
> > > >                 space_info->bytes_reserved -= num_bytes;
> > > > @@ -3690,7 +3690,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
> > > >                 spin_unlock(&space_info->lock);
> > > >         } else {
> > > >                 old_val -= num_bytes;
> > > > -               cache->used = old_val;
> > > > +               WRITE_ONCE(cache->used, old_val);
> > > >                 cache->pinned += num_bytes;
> > > >                 btrfs_space_info_update_bytes_pinned(space_info, num_bytes);
> > > >                 space_info->bytes_used -= num_bytes;
> > > > diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
> > > > index e815d165cccc..71c57b571d50 100644
> > > > --- a/fs/btrfs/discard.c
> > > > +++ b/fs/btrfs/discard.c
> > > > @@ -363,7 +363,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl,
> > > >         if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC))
> > > >                 return;
> > > >
> > > > -       if (block_group->used == 0)
> > > > +       if (READ_ONCE(block_group->used) == 0)
> > >
> > > There are at least 3 more places in discard.c where we access ->used
> > > without being under the protection of the block group's spinlock.
> > > So let's fix this for all places and not just a single one...
> > >
> > > Also, this is quite ugly to spread READ_ONCE/WRITE_ONCE all over the place.
> > > What we typically do in btrfs is to add helpers that hide them, see
> > > block-rsv.h for example.
> > >
> > > Also, I don't think we need READ_ONCE/WRITE_ONCE.
> > > We could use data_race(), though I think that could be subject to
> > > load/store tearing, or just take the lock.
> > > So adding a helper like this to block-group.h:
> > >
> > > static inline u64 btrfs_block_group_used(struct btrfs_block_group *bg)
> > > {
> > >    u64 ret;
> > >
> > >    spin_lock(&bg->lock);
> > >    ret = bg->used;
> > >    spin_unlock(&bg->lock);
> > >
> > >     return ret;
> > > }
> >
> > Would memory barriers be sufficient here? Taking a lock just for
> > reading one member seems excessive...
>
> Do you think there's heavy contention on this lock?

This is not about contention. Spin lock should just never be used this
way. Or any lock actually. The critical section only contains a single
fetch operation which does not justify using a lock.
Hence the only guarantee such lock usage provides are the implicit
memory barriers (from which maybe only one of them is really needed
depending on the context where this helper is going to be used).

Simply put, the lock is degraded into a memory barrier this way. So
why not just use the barriers in the first place if only ordering
guarantees are required? It only makes sense.

> Usually I prefer to go the simplest way, and using locks is a lot more
> straightforward and easier to understand than memory barriers.

How so? Locks provide the same memory barriers implicitly.

> Unless it's clear there's an observable performance penalty, keeping
> it simple is better.

Exactly. Here is where our opinions differ. For me 'simple' means
without useless locking.
I mean especially with locks they should only be used when absolutely
necessary. In a sense of 'use the right tool for the job'. There are
more lightweight tools possible in this context. Locks provide
additional guarantees which are not needed here.

On the other hand I understand that using a lock is a 'better safe
than sorry' approach which should also work. Just keep in mind that
spinlock may sleep on RT.

> data_race() may be ok here, at least for one of the unprotected
> accesses it's fine, but would have to analyse the other 3 cases.

That's exactly my thoughts. Maybe not even the barriers are needed
after all. This needs to be checked on a per-case basis.

> >
> > > And then use btrfs_bock_group_used() everywhere in discard.c where we
> > > aren't holding a block group's lock.
> > >
> > > Thanks.
> > >
> > >
> > > >                 add_to_discard_unused_list(discard_ctl, block_group);
> > > >         else
> > > >                 add_to_discard_list(discard_ctl, block_group);
> > > > --
> > > > 2.34.1
> > > >
> > > >
> > >
Filipe Manana Feb. 18, 2025, 4:28 p.m. UTC | #5
On Tue, Feb 18, 2025 at 4:14 PM Daniel Vacek <neelx@suse.com> wrote:
>
> On Tue, 18 Feb 2025 at 11:19, Filipe Manana <fdmanana@kernel.org> wrote:
> >
> > On Tue, Feb 18, 2025 at 8:08 AM Daniel Vacek <neelx@suse.com> wrote:
> > >
> > > On Mon, 10 Feb 2025 at 12:11, Filipe Manana <fdmanana@kernel.org> wrote:
> > > >
> > > > On Sat, Feb 8, 2025 at 7:38 AM Hao-ran Zheng <zhenghaoran154@gmail.com> wrote:
> > > > >
> > > > > A data race may occur when the function `btrfs_discard_queue_work()`
> > > > > and the function `btrfs_update_block_group()` is executed concurrently.
> > > > > Specifically, when the `btrfs_update_block_group()` function is executed
> > > > > to lines `cache->used = old_val;`, and `btrfs_discard_queue_work()`
> > > > > is accessing `if(block_group->used == 0)` will appear with data race,
> > > > > which may cause block_group to be placed unexpectedly in discard_list or
> > > > > discard_unused_list. The specific function call stack is as follows:
> > > > >
> > > > > ============DATA_RACE============
> > > > >  btrfs_discard_queue_work+0x245/0x500 [btrfs]
> > > > >  __btrfs_add_free_space+0x3066/0x32f0 [btrfs]
> > > > >  btrfs_add_free_space+0x19a/0x200 [btrfs]
> > > > >  unpin_extent_range+0x847/0x2120 [btrfs]
> > > > >  btrfs_finish_extent_commit+0x9a3/0x1840 [btrfs]
> > > > >  btrfs_commit_transaction+0x5f65/0xc0f0 [btrfs]
> > > > >  transaction_kthread+0x764/0xc20 [btrfs]
> > > > >  kthread+0x292/0x330
> > > > >  ret_from_fork+0x4d/0x80
> > > > >  ret_from_fork_asm+0x1a/0x30
> > > > > ============OTHER_INFO============
> > > > >  btrfs_update_block_group+0xa9d/0x2430 [btrfs]
> > > > >  __btrfs_free_extent+0x4f69/0x9920 [btrfs]
> > > > >  __btrfs_run_delayed_refs+0x290e/0xd7d0 [btrfs]
> > > > >  btrfs_run_delayed_refs+0x317/0x770 [btrfs]
> > > > >  flush_space+0x388/0x1440 [btrfs]
> > > > >  btrfs_preempt_reclaim_metadata_space+0xd65/0x14c0 [btrfs]
> > > > >  process_scheduled_works+0x716/0xf10
> > > > >  worker_thread+0xb6a/0x1190
> > > > >  kthread+0x292/0x330
> > > > >  ret_from_fork+0x4d/0x80
> > > > >  ret_from_fork_asm+0x1a/0x30
> > > > > =================================
> > > > >
> > > > > Although the `block_group->used` was checked again in the use of the
> > > > > `peek_discard_list` function, considering that `block_group->used` is
> > > > > a 64-bit variable, we still think that the data race here is an
> > > > > unexpected behavior. It is recommended to add `READ_ONCE` and
> > > > > `WRITE_ONCE` to read and write.
> > > > >
> > > > > Signed-off-by: Hao-ran Zheng <zhenghaoran154@gmail.com>
> > > > > ---
> > > > >  fs/btrfs/block-group.c | 4 ++--
> > > > >  fs/btrfs/discard.c     | 2 +-
> > > > >  2 files changed, 3 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> > > > > index c0a8f7d92acc..c681b97f6835 100644
> > > > > --- a/fs/btrfs/block-group.c
> > > > > +++ b/fs/btrfs/block-group.c
> > > > > @@ -3678,7 +3678,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
> > > > >         old_val = cache->used;
> > > > >         if (alloc) {
> > > > >                 old_val += num_bytes;
> > > > > -               cache->used = old_val;
> > > > > +               WRITE_ONCE(cache->used, old_val);
> > > > >                 cache->reserved -= num_bytes;
> > > > >                 cache->reclaim_mark = 0;
> > > > >                 space_info->bytes_reserved -= num_bytes;
> > > > > @@ -3690,7 +3690,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
> > > > >                 spin_unlock(&space_info->lock);
> > > > >         } else {
> > > > >                 old_val -= num_bytes;
> > > > > -               cache->used = old_val;
> > > > > +               WRITE_ONCE(cache->used, old_val);
> > > > >                 cache->pinned += num_bytes;
> > > > >                 btrfs_space_info_update_bytes_pinned(space_info, num_bytes);
> > > > >                 space_info->bytes_used -= num_bytes;
> > > > > diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
> > > > > index e815d165cccc..71c57b571d50 100644
> > > > > --- a/fs/btrfs/discard.c
> > > > > +++ b/fs/btrfs/discard.c
> > > > > @@ -363,7 +363,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl,
> > > > >         if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC))
> > > > >                 return;
> > > > >
> > > > > -       if (block_group->used == 0)
> > > > > +       if (READ_ONCE(block_group->used) == 0)
> > > >
> > > > There are at least 3 more places in discard.c where we access ->used
> > > > without being under the protection of the block group's spinlock.
> > > > So let's fix this for all places and not just a single one...
> > > >
> > > > Also, this is quite ugly to spread READ_ONCE/WRITE_ONCE all over the place.
> > > > What we typically do in btrfs is to add helpers that hide them, see
> > > > block-rsv.h for example.
> > > >
> > > > Also, I don't think we need READ_ONCE/WRITE_ONCE.
> > > > We could use data_race(), though I think that could be subject to
> > > > load/store tearing, or just take the lock.
> > > > So adding a helper like this to block-group.h:
> > > >
> > > > static inline u64 btrfs_block_group_used(struct btrfs_block_group *bg)
> > > > {
> > > >    u64 ret;
> > > >
> > > >    spin_lock(&bg->lock);
> > > >    ret = bg->used;
> > > >    spin_unlock(&bg->lock);
> > > >
> > > >     return ret;
> > > > }
> > >
> > > Would memory barriers be sufficient here? Taking a lock just for
> > > reading one member seems excessive...
> >
> > Do you think there's heavy contention on this lock?
>
> This is not about contention. Spin lock should just never be used this
> way. Or any lock actually. The critical section only contains a single
> fetch operation which does not justify using a lock.
> Hence the only guarantee such lock usage provides are the implicit
> memory barriers (from which maybe only one of them is really needed
> depending on the context where this helper is going to be used).
>
> Simply put, the lock is degraded into a memory barrier this way. So
> why not just use the barriers in the first place if only ordering
> guarantees are required? It only makes sense.

As said earlier, it's a lot easier to reason about lock() and unlock()
calls rather than spreading memory barriers in the write and read
sides.
Historically he had several mistakes with barriers, they're simply not
as straightforward to reason as locks.

Plus spreading the barriers in the read and write sides makes the code
not so easy to read, not to mention in case of any new sites updating
the member, we'll have to not forget adding a barrier.

It's just easier to reason and maintain.


>
> > Usually I prefer to go the simplest way, and using locks is a lot more
> > straightforward and easier to understand than memory barriers.
>
> How so? Locks provide the same memory barriers implicitly.

I'm not saying they don't.
I'm talking about easier code to read and maintain.

>
> > Unless it's clear there's an observable performance penalty, keeping
> > it simple is better.
>
> Exactly. Here is where our opinions differ. For me 'simple' means
> without useless locking.
> I mean especially with locks they should only be used when absolutely
> necessary. In a sense of 'use the right tool for the job'. There are
> more lightweight tools possible in this context. Locks provide
> additional guarantees which are not needed here.
>
> On the other hand I understand that using a lock is a 'better safe
> than sorry' approach which should also work. Just keep in mind that
> spinlock may sleep on RT.

It's not about a better safe than sorry, but easier to read, reason
and maintain.

>
> > data_race() may be ok here, at least for one of the unprotected
> > accesses it's fine, but would have to analyse the other 3 cases.
>
> That's exactly my thoughts. Maybe not even the barriers are needed
> after all. This needs to be checked on a per-case basis.
>
> > >
> > > > And then use btrfs_bock_group_used() everywhere in discard.c where we
> > > > aren't holding a block group's lock.
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >                 add_to_discard_unused_list(discard_ctl, block_group);
> > > > >         else
> > > > >                 add_to_discard_list(discard_ctl, block_group);
> > > > > --
> > > > > 2.34.1
> > > > >
> > > > >
> > > >
Daniel Vacek Feb. 19, 2025, 7:47 a.m. UTC | #6
On Tue, 18 Feb 2025 at 17:29, Filipe Manana <fdmanana@kernel.org> wrote:
>
> On Tue, Feb 18, 2025 at 4:14 PM Daniel Vacek <neelx@suse.com> wrote:
> >
> > On Tue, 18 Feb 2025 at 11:19, Filipe Manana <fdmanana@kernel.org> wrote:
> > >
> > > On Tue, Feb 18, 2025 at 8:08 AM Daniel Vacek <neelx@suse.com> wrote:
> > > >
> > > > On Mon, 10 Feb 2025 at 12:11, Filipe Manana <fdmanana@kernel.org> wrote:
> > > > >
> > > > > On Sat, Feb 8, 2025 at 7:38 AM Hao-ran Zheng <zhenghaoran154@gmail.com> wrote:
> > > > > >
> > > > > > A data race may occur when the function `btrfs_discard_queue_work()`
> > > > > > and the function `btrfs_update_block_group()` is executed concurrently.
> > > > > > Specifically, when the `btrfs_update_block_group()` function is executed
> > > > > > to lines `cache->used = old_val;`, and `btrfs_discard_queue_work()`
> > > > > > is accessing `if(block_group->used == 0)` will appear with data race,
> > > > > > which may cause block_group to be placed unexpectedly in discard_list or
> > > > > > discard_unused_list. The specific function call stack is as follows:
> > > > > >
> > > > > > ============DATA_RACE============
> > > > > >  btrfs_discard_queue_work+0x245/0x500 [btrfs]
> > > > > >  __btrfs_add_free_space+0x3066/0x32f0 [btrfs]
> > > > > >  btrfs_add_free_space+0x19a/0x200 [btrfs]
> > > > > >  unpin_extent_range+0x847/0x2120 [btrfs]
> > > > > >  btrfs_finish_extent_commit+0x9a3/0x1840 [btrfs]
> > > > > >  btrfs_commit_transaction+0x5f65/0xc0f0 [btrfs]
> > > > > >  transaction_kthread+0x764/0xc20 [btrfs]
> > > > > >  kthread+0x292/0x330
> > > > > >  ret_from_fork+0x4d/0x80
> > > > > >  ret_from_fork_asm+0x1a/0x30
> > > > > > ============OTHER_INFO============
> > > > > >  btrfs_update_block_group+0xa9d/0x2430 [btrfs]
> > > > > >  __btrfs_free_extent+0x4f69/0x9920 [btrfs]
> > > > > >  __btrfs_run_delayed_refs+0x290e/0xd7d0 [btrfs]
> > > > > >  btrfs_run_delayed_refs+0x317/0x770 [btrfs]
> > > > > >  flush_space+0x388/0x1440 [btrfs]
> > > > > >  btrfs_preempt_reclaim_metadata_space+0xd65/0x14c0 [btrfs]
> > > > > >  process_scheduled_works+0x716/0xf10
> > > > > >  worker_thread+0xb6a/0x1190
> > > > > >  kthread+0x292/0x330
> > > > > >  ret_from_fork+0x4d/0x80
> > > > > >  ret_from_fork_asm+0x1a/0x30
> > > > > > =================================
> > > > > >
> > > > > > Although the `block_group->used` was checked again in the use of the
> > > > > > `peek_discard_list` function, considering that `block_group->used` is
> > > > > > a 64-bit variable, we still think that the data race here is an
> > > > > > unexpected behavior. It is recommended to add `READ_ONCE` and
> > > > > > `WRITE_ONCE` to read and write.
> > > > > >
> > > > > > Signed-off-by: Hao-ran Zheng <zhenghaoran154@gmail.com>
> > > > > > ---
> > > > > >  fs/btrfs/block-group.c | 4 ++--
> > > > > >  fs/btrfs/discard.c     | 2 +-
> > > > > >  2 files changed, 3 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> > > > > > index c0a8f7d92acc..c681b97f6835 100644
> > > > > > --- a/fs/btrfs/block-group.c
> > > > > > +++ b/fs/btrfs/block-group.c
> > > > > > @@ -3678,7 +3678,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
> > > > > >         old_val = cache->used;
> > > > > >         if (alloc) {
> > > > > >                 old_val += num_bytes;
> > > > > > -               cache->used = old_val;
> > > > > > +               WRITE_ONCE(cache->used, old_val);
> > > > > >                 cache->reserved -= num_bytes;
> > > > > >                 cache->reclaim_mark = 0;
> > > > > >                 space_info->bytes_reserved -= num_bytes;
> > > > > > @@ -3690,7 +3690,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
> > > > > >                 spin_unlock(&space_info->lock);
> > > > > >         } else {
> > > > > >                 old_val -= num_bytes;
> > > > > > -               cache->used = old_val;
> > > > > > +               WRITE_ONCE(cache->used, old_val);
> > > > > >                 cache->pinned += num_bytes;
> > > > > >                 btrfs_space_info_update_bytes_pinned(space_info, num_bytes);
> > > > > >                 space_info->bytes_used -= num_bytes;
> > > > > > diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
> > > > > > index e815d165cccc..71c57b571d50 100644
> > > > > > --- a/fs/btrfs/discard.c
> > > > > > +++ b/fs/btrfs/discard.c
> > > > > > @@ -363,7 +363,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl,
> > > > > >         if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC))
> > > > > >                 return;
> > > > > >
> > > > > > -       if (block_group->used == 0)
> > > > > > +       if (READ_ONCE(block_group->used) == 0)
> > > > >
> > > > > There are at least 3 more places in discard.c where we access ->used
> > > > > without being under the protection of the block group's spinlock.
> > > > > So let's fix this for all places and not just a single one...
> > > > >
> > > > > Also, this is quite ugly to spread READ_ONCE/WRITE_ONCE all over the place.
> > > > > What we typically do in btrfs is to add helpers that hide them, see
> > > > > block-rsv.h for example.
> > > > >
> > > > > Also, I don't think we need READ_ONCE/WRITE_ONCE.
> > > > > We could use data_race(), though I think that could be subject to
> > > > > load/store tearing, or just take the lock.
> > > > > So adding a helper like this to block-group.h:
> > > > >
> > > > > static inline u64 btrfs_block_group_used(struct btrfs_block_group *bg)
> > > > > {
> > > > >    u64 ret;
> > > > >
> > > > >    spin_lock(&bg->lock);
> > > > >    ret = bg->used;
> > > > >    spin_unlock(&bg->lock);
> > > > >
> > > > >     return ret;
> > > > > }
> > > >
> > > > Would memory barriers be sufficient here? Taking a lock just for
> > > > reading one member seems excessive...
> > >
> > > Do you think there's heavy contention on this lock?
> >
> > This is not about contention. Spin lock should just never be used this
> > way. Or any lock actually. The critical section only contains a single
> > fetch operation which does not justify using a lock.
> > Hence the only guarantee such lock usage provides are the implicit
> > memory barriers (from which maybe only one of them is really needed
> > depending on the context where this helper is going to be used).
> >
> > Simply put, the lock is degraded into a memory barrier this way. So
> > why not just use the barriers in the first place if only ordering
> > guarantees are required? It only makes sense.
>
> As said earlier, it's a lot easier to reason about lock() and unlock()
> calls rather than spreading memory barriers in the write and read
> sides.
> Historically he had several mistakes with barriers, they're simply not
> as straightforward to reason as locks.
>
> Plus spreading the barriers in the read and write sides makes the code
> not so easy to read, not to mention in case of any new sites updating
> the member, we'll have to not forget adding a barrier.
>
> It's just easier to reason and maintain.
>
>
> >
> > > Usually I prefer to go the simplest way, and using locks is a lot more
> > > straightforward and easier to understand than memory barriers.
> >
> > How so? Locks provide the same memory barriers implicitly.
>
> I'm not saying they don't.
> I'm talking about easier code to read and maintain.
>
> >
> > > Unless it's clear there's an observable performance penalty, keeping
> > > it simple is better.
> >
> > Exactly. Here is where our opinions differ. For me 'simple' means
> > without useless locking.
> > I mean especially with locks they should only be used when absolutely
> > necessary. In a sense of 'use the right tool for the job'. There are
> > more lightweight tools possible in this context. Locks provide
> > additional guarantees which are not needed here.
> >
> > On the other hand I understand that using a lock is a 'better safe
> > than sorry' approach which should also work. Just keep in mind that
> > spinlock may sleep on RT.
>
> It's not about a better safe than sorry, but easier to read, reason
> and maintain.

I see. Understood. I guess I'm just too micro-optimizing where it's
not really needed and only adds complexity in a way.

> >
> > > data_race() may be ok here, at least for one of the unprotected
> > > accesses it's fine, but would have to analyse the other 3 cases.
> >
> > That's exactly my thoughts. Maybe not even the barriers are needed
> > after all. This needs to be checked on a per-case basis.
> >
> > > >
> > > > > And then use btrfs_bock_group_used() everywhere in discard.c where we
> > > > > aren't holding a block group's lock.
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > >                 add_to_discard_unused_list(discard_ctl, block_group);
> > > > > >         else
> > > > > >                 add_to_discard_list(discard_ctl, block_group);
> > > > > > --
> > > > > > 2.34.1
> > > > > >
> > > > > >
> > > > >
haoran zheng Feb. 19, 2025, 8:52 a.m. UTC | #7
On Wed, Feb 19, 2025 at 12:29 AM Filipe Manana <fdmanana@kernel.org> wrote:
>
> On Tue, Feb 18, 2025 at 4:14 PM Daniel Vacek <neelx@suse.com> wrote:
> >
> > On Tue, 18 Feb 2025 at 11:19, Filipe Manana <fdmanana@kernel.org> wrote:
> > >
> > > On Tue, Feb 18, 2025 at 8:08 AM Daniel Vacek <neelx@suse.com> wrote:
> > > >
> > > > On Mon, 10 Feb 2025 at 12:11, Filipe Manana <fdmanana@kernel.org> wrote:
> > > > >
> > > > > On Sat, Feb 8, 2025 at 7:38 AM Hao-ran Zheng <zhenghaoran154@gmail.com> wrote:
> > > > > >
> > > > > > A data race may occur when the function `btrfs_discard_queue_work()`
> > > > > > and the function `btrfs_update_block_group()` is executed concurrently.
> > > > > > Specifically, when the `btrfs_update_block_group()` function is executed
> > > > > > to lines `cache->used = old_val;`, and `btrfs_discard_queue_work()`
> > > > > > is accessing `if(block_group->used == 0)` will appear with data race,
> > > > > > which may cause block_group to be placed unexpectedly in discard_list or
> > > > > > discard_unused_list. The specific function call stack is as follows:
> > > > > >
> > > > > > ============DATA_RACE============
> > > > > >  btrfs_discard_queue_work+0x245/0x500 [btrfs]
> > > > > >  __btrfs_add_free_space+0x3066/0x32f0 [btrfs]
> > > > > >  btrfs_add_free_space+0x19a/0x200 [btrfs]
> > > > > >  unpin_extent_range+0x847/0x2120 [btrfs]
> > > > > >  btrfs_finish_extent_commit+0x9a3/0x1840 [btrfs]
> > > > > >  btrfs_commit_transaction+0x5f65/0xc0f0 [btrfs]
> > > > > >  transaction_kthread+0x764/0xc20 [btrfs]
> > > > > >  kthread+0x292/0x330
> > > > > >  ret_from_fork+0x4d/0x80
> > > > > >  ret_from_fork_asm+0x1a/0x30
> > > > > > ============OTHER_INFO============
> > > > > >  btrfs_update_block_group+0xa9d/0x2430 [btrfs]
> > > > > >  __btrfs_free_extent+0x4f69/0x9920 [btrfs]
> > > > > >  __btrfs_run_delayed_refs+0x290e/0xd7d0 [btrfs]
> > > > > >  btrfs_run_delayed_refs+0x317/0x770 [btrfs]
> > > > > >  flush_space+0x388/0x1440 [btrfs]
> > > > > >  btrfs_preempt_reclaim_metadata_space+0xd65/0x14c0 [btrfs]
> > > > > >  process_scheduled_works+0x716/0xf10
> > > > > >  worker_thread+0xb6a/0x1190
> > > > > >  kthread+0x292/0x330
> > > > > >  ret_from_fork+0x4d/0x80
> > > > > >  ret_from_fork_asm+0x1a/0x30
> > > > > > =================================
> > > > > >
> > > > > > Although the `block_group->used` was checked again in the use of the
> > > > > > `peek_discard_list` function, considering that `block_group->used` is
> > > > > > a 64-bit variable, we still think that the data race here is an
> > > > > > unexpected behavior. It is recommended to add `READ_ONCE` and
> > > > > > `WRITE_ONCE` to read and write.
> > > > > >
> > > > > > Signed-off-by: Hao-ran Zheng <zhenghaoran154@gmail.com>
> > > > > > ---
> > > > > >  fs/btrfs/block-group.c | 4 ++--
> > > > > >  fs/btrfs/discard.c     | 2 +-
> > > > > >  2 files changed, 3 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> > > > > > index c0a8f7d92acc..c681b97f6835 100644
> > > > > > --- a/fs/btrfs/block-group.c
> > > > > > +++ b/fs/btrfs/block-group.c
> > > > > > @@ -3678,7 +3678,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
> > > > > >         old_val = cache->used;
> > > > > >         if (alloc) {
> > > > > >                 old_val += num_bytes;
> > > > > > -               cache->used = old_val;
> > > > > > +               WRITE_ONCE(cache->used, old_val);
> > > > > >                 cache->reserved -= num_bytes;
> > > > > >                 cache->reclaim_mark = 0;
> > > > > >                 space_info->bytes_reserved -= num_bytes;
> > > > > > @@ -3690,7 +3690,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
> > > > > >                 spin_unlock(&space_info->lock);
> > > > > >         } else {
> > > > > >                 old_val -= num_bytes;
> > > > > > -               cache->used = old_val;
> > > > > > +               WRITE_ONCE(cache->used, old_val);
> > > > > >                 cache->pinned += num_bytes;
> > > > > >                 btrfs_space_info_update_bytes_pinned(space_info, num_bytes);
> > > > > >                 space_info->bytes_used -= num_bytes;
> > > > > > diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
> > > > > > index e815d165cccc..71c57b571d50 100644
> > > > > > --- a/fs/btrfs/discard.c
> > > > > > +++ b/fs/btrfs/discard.c
> > > > > > @@ -363,7 +363,7 @@ void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl,
> > > > > >         if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC))
> > > > > >                 return;
> > > > > >
> > > > > > -       if (block_group->used == 0)
> > > > > > +       if (READ_ONCE(block_group->used) == 0)
> > > > >
> > > > > There are at least 3 more places in discard.c where we access ->used
> > > > > without being under the protection of the block group's spinlock.
> > > > > So let's fix this for all places and not just a single one...
> > > > >
> > > > > Also, this is quite ugly to spread READ_ONCE/WRITE_ONCE all over the place.
> > > > > What we typically do in btrfs is to add helpers that hide them, see
> > > > > block-rsv.h for example.
> > > > >
> > > > > Also, I don't think we need READ_ONCE/WRITE_ONCE.
> > > > > We could use data_race(), though I think that could be subject to
> > > > > load/store tearing, or just take the lock.
> > > > > So adding a helper like this to block-group.h:
> > > > >
> > > > > static inline u64 btrfs_block_group_used(struct btrfs_block_group *bg)
> > > > > {
> > > > >    u64 ret;
> > > > >
> > > > >    spin_lock(&bg->lock);
> > > > >    ret = bg->used;
> > > > >    spin_unlock(&bg->lock);
> > > > >
> > > > >     return ret;
> > > > > }

I understand that using lock to protect block_group->used
in discard.c file is feasible. In addition, I looked at the code
of block-group.c and found that locks have been added in
some places where block_group->used are used. , it
seems that it is not appropriate to call
btrfs_block_group_used again to obtain (because it will
cause deadlock). I would like to ask other similar places
where block_group->used needs to call
btrfs_block_group_used function in addition to the four
places in discard.c?

> > > >
> > > > Would memory barriers be sufficient here? Taking a lock just for
> > > > reading one member seems excessive...

Do I need to perform performance testing to ensure the impact if I lock it?

> > >
> > > Do you think there's heavy contention on this lock?
> >
> > This is not about contention. Spin lock should just never be used this
> > way. Or any lock actually. The critical section only contains a single
> > fetch operation which does not justify using a lock.
> > Hence the only guarantee such lock usage provides are the implicit
> > memory barriers (from which maybe only one of them is really needed
> > depending on the context where this helper is going to be used).
> >
> > Simply put, the lock is degraded into a memory barrier this way. So
> > why not just use the barriers in the first place if only ordering
> > guarantees are required? It only makes sense.
>
> As said earlier, it's a lot easier to reason about lock() and unlock()
> calls rather than spreading memory barriers in the write and read
> sides.
> Historically he had several mistakes with barriers, they're simply not
> as straightforward to reason as locks.
>
> Plus spreading the barriers in the read and write sides makes the code
> not so easy to read, not to mention in case of any new sites updating
> the member, we'll have to not forget adding a barrier.
>
> It's just easier to reason and maintain.
>
>
> >
> > > Usually I prefer to go the simplest way, and using locks is a lot more
> > > straightforward and easier to understand than memory barriers.
> >
> > How so? Locks provide the same memory barriers implicitly.
>
> I'm not saying they don't.
> I'm talking about easier code to read and maintain.
>
> >
> > > Unless it's clear there's an observable performance penalty, keeping
> > > it simple is better.
> >
> > Exactly. Here is where our opinions differ. For me 'simple' means
> > without useless locking.
> > I mean especially with locks they should only be used when absolutely
> > necessary. In a sense of 'use the right tool for the job'. There are
> > more lightweight tools possible in this context. Locks provide
> > additional guarantees which are not needed here.
> >
> > On the other hand I understand that using a lock is a 'better safe
> > than sorry' approach which should also work. Just keep in mind that
> > spinlock may sleep on RT.
>
> It's not about a better safe than sorry, but easier to read, reason
> and maintain.
>
> >
> > > data_race() may be ok here, at least for one of the unprotected
> > > accesses it's fine, but would have to analyse the other 3 cases.
> >
> > That's exactly my thoughts. Maybe not even the barriers are needed
> > after all. This needs to be checked on a per-case basis.
> >
> > > >
> > > > > And then use btrfs_bock_group_used() everywhere in discard.c where we
> > > > > aren't holding a block group's lock.
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > >                 add_to_discard_unused_list(discard_ctl, block_group);
> > > > > >         else
> > > > > >                 add_to_discard_list(discard_ctl, block_group);
> > > > > > --
> > > > > > 2.34.1
> > > > > >
> > > > > >
> > > > >
diff mbox series

Patch

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index c0a8f7d92acc..c681b97f6835 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -3678,7 +3678,7 @@  int btrfs_update_block_group(struct btrfs_trans_handle *trans,
 	old_val = cache->used;
 	if (alloc) {
 		old_val += num_bytes;
-		cache->used = old_val;
+		WRITE_ONCE(cache->used, old_val);
 		cache->reserved -= num_bytes;
 		cache->reclaim_mark = 0;
 		space_info->bytes_reserved -= num_bytes;
@@ -3690,7 +3690,7 @@  int btrfs_update_block_group(struct btrfs_trans_handle *trans,
 		spin_unlock(&space_info->lock);
 	} else {
 		old_val -= num_bytes;
-		cache->used = old_val;
+		WRITE_ONCE(cache->used, old_val);
 		cache->pinned += num_bytes;
 		btrfs_space_info_update_bytes_pinned(space_info, num_bytes);
 		space_info->bytes_used -= num_bytes;
diff --git a/fs/btrfs/discard.c b/fs/btrfs/discard.c
index e815d165cccc..71c57b571d50 100644
--- a/fs/btrfs/discard.c
+++ b/fs/btrfs/discard.c
@@ -363,7 +363,7 @@  void btrfs_discard_queue_work(struct btrfs_discard_ctl *discard_ctl,
 	if (!block_group || !btrfs_test_opt(block_group->fs_info, DISCARD_ASYNC))
 		return;
 
-	if (block_group->used == 0)
+	if (READ_ONCE(block_group->used) == 0)
 		add_to_discard_unused_list(discard_ctl, block_group);
 	else
 		add_to_discard_list(discard_ctl, block_group);