Message ID | 20230815065559.31546-1-xiaoshoukui@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: fix BUG_ON condition in btrfs_cancel_balance | expand |
On Tue, Aug 15, 2023 at 02:55:59AM -0400, xiaoshoukui wrote: > Pausing and canceling balance can race to intterupt balance lead to BUG_ON > panic in btrfs_cancel_balance. The BUG_ON condition in btrfs_cancel_balance > does not take this race scenario into account. Seems that it's from times the balance was not cancellable the same way as now. Also it's a good time to switch the BUG_ON to an assertion or handle it properly. > > However, the race condition has no other side effects. We can fix that. > > Reproducing it with panic trace like this: > kernel BUG at fs/btrfs/volumes.c:4618! > RIP: 0010:btrfs_cancel_balance+0x5cf/0x6a0 > Call Trace: > <TASK> > ? do_nanosleep+0x60/0x120 > ? hrtimer_nanosleep+0xb7/0x1a0 > ? sched_core_clone_cookie+0x70/0x70 > btrfs_ioctl_balance_ctl+0x55/0x70 > btrfs_ioctl+0xa46/0xd20 > __x64_sys_ioctl+0x7d/0xa0 > do_syscall_64+0x38/0x80 > entry_SYSCALL_64_after_hwframe+0x63/0xcd > > Race scenario as follows: > > mutex_unlock(&fs_info->balance_mutex); > > -------------------- > > .......issue pause and cancel req in another thread > > -------------------- > > ret = __btrfs_balance(fs_info); > > > > mutex_lock(&fs_info->balance_mutex); > > if (ret == -ECANCELED && atomic_read(&fs_info->balance_pause_req)) { > > btrfs_info(fs_info, "balance: paused"); > > btrfs_exclop_balance(fs_info, BTRFS_EXCLOP_BALANCE_PAUSED); > > } > > Signed-off-by: xiaoshoukui <xiaoshoukui@ruijie.com.cn> > --- > fs/btrfs/volumes.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 2ecb76cf3d91..886d667419ed 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -4638,8 +4638,7 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info) > } > } > > - BUG_ON(fs_info->balance_ctl || > - test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags)); > + BUG_ON(test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags)); I'll change to to ASSERT, this is really to verify that the state tracking works properly. > atomic_dec(&fs_info->balance_cancel_req); > mutex_unlock(&fs_info->balance_mutex); > return 0; > -- > 2.34.1
> Seems that it's from times the balance was not cancellable the same way > as now. Also it's a good time to switch the BUG_ON to an assertion or > handle it properly. That's the point. Canceling the balance only takes into account the normal scenarios. Replacing the BUG ON here with an assertion would make the code cleaner. > I'll change to to ASSERT, this is really to verify that the state > tracking works properly. The ASSERT and BUG ON macros have already helped us uncover many hidden issues.
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 2ecb76cf3d91..886d667419ed 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4638,8 +4638,7 @@ int btrfs_cancel_balance(struct btrfs_fs_info *fs_info) } } - BUG_ON(fs_info->balance_ctl || - test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags)); + BUG_ON(test_bit(BTRFS_FS_BALANCE_RUNNING, &fs_info->flags)); atomic_dec(&fs_info->balance_cancel_req); mutex_unlock(&fs_info->balance_mutex); return 0;