diff mbox series

[v2] dm-raid: Fix WARN_ON_ONCE check for sync_thread in raid_resume

Message ID 20240612181222.877577-1-bmarzins@redhat.com (mailing list archive)
State Superseded, archived
Headers show
Series [v2] dm-raid: Fix WARN_ON_ONCE check for sync_thread in raid_resume | expand

Checks

Context Check Description
mdraidci/vmtest-md-6_11-PR success PR summary
mdraidci/vmtest-md-6_11-VM_Test-0 success Logs for build-kernel

Commit Message

Benjamin Marzinski June 12, 2024, 6:12 p.m. UTC
rm-raid devices will occasionally trigger the following warning when
being resumed after a table load because DM_RECOVERY_RUNNING is set:

WARNING: CPU: 7 PID: 5660 at drivers/md/dm-raid.c:4105 raid_resume+0xee/0x100 [dm_raid]

The failing check is:
WARN_ON_ONCE(test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));

This check is designed to make sure that the sync thread isn't
registered, but md_check_recovery can set MD_RECOVERY_RUNNING without
the sync_thread ever getting registered. Instead of checking if
MD_RECOVERY_RUNNING is set, check if sync_thread is non-NULL.

Fixes: 16c4770c75b1 ("dm-raid: really frozen sync_thread during suspend")
Suggested-by: Yu Kuai <yukuai1@huaweicloud.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
Changes in v2:
 - Move mddev_lock_nointr() earlier to protect dereference and use
   rcu_dereference_protected() to access sync_thread

 drivers/md/dm-raid.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Yu Kuai June 13, 2024, 3:50 a.m. UTC | #1
Hi,

在 2024/06/13 2:12, Benjamin Marzinski 写道:
> rm-raid devices will occasionally trigger the following warning when
dm-raid

> being resumed after a table load because DM_RECOVERY_RUNNING is set:
> 
> WARNING: CPU: 7 PID: 5660 at drivers/md/dm-raid.c:4105 raid_resume+0xee/0x100 [dm_raid]
> 
> The failing check is:
> WARN_ON_ONCE(test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
> 
> This check is designed to make sure that the sync thread isn't
> registered, but md_check_recovery can set MD_RECOVERY_RUNNING without
> the sync_thread ever getting registered. Instead of checking if
> MD_RECOVERY_RUNNING is set, check if sync_thread is non-NULL.
> 
> Fixes: 16c4770c75b1 ("dm-raid: really frozen sync_thread during suspend")
> Suggested-by: Yu Kuai <yukuai1@huaweicloud.com>
Please use the address yukuai3@huawei.com

> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> Changes in v2:
>   - Move mddev_lock_nointr() earlier to protect dereference and use
>     rcu_dereference_protected() to access sync_thread
> 
>   drivers/md/dm-raid.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
> index abe88d1e6735..b149ac46a990 100644
> --- a/drivers/md/dm-raid.c
> +++ b/drivers/md/dm-raid.c
> @@ -4101,10 +4101,11 @@ static void raid_resume(struct dm_target *ti)
>   		if (mddev->delta_disks < 0)
>   			rs_set_capacity(rs);
>   
> +		mddev_lock_nointr(mddev);
>   		WARN_ON_ONCE(!test_bit(MD_RECOVERY_FROZEN, &mddev->recovery));
> -		WARN_ON_ONCE(test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
> +		WARN_ON_ONCE(rcu_dereference_protected(mddev->sync_thread,
> +						       lockdep_is_held(&mddev->reconfig_mutex)));
>   		clear_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags);
> -		mddev_lock_nointr(mddev);

Other than the typo, LGTM

Suggested-and-reviewed-by: Yu Kuai <yukuai3@huawei.com>
>   		mddev->ro = 0;
>   		mddev->in_sync = 0;
>   		md_unfrozen_sync_thread(mddev);
>
diff mbox series

Patch

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index abe88d1e6735..b149ac46a990 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -4101,10 +4101,11 @@  static void raid_resume(struct dm_target *ti)
 		if (mddev->delta_disks < 0)
 			rs_set_capacity(rs);
 
+		mddev_lock_nointr(mddev);
 		WARN_ON_ONCE(!test_bit(MD_RECOVERY_FROZEN, &mddev->recovery));
-		WARN_ON_ONCE(test_bit(MD_RECOVERY_RUNNING, &mddev->recovery));
+		WARN_ON_ONCE(rcu_dereference_protected(mddev->sync_thread,
+						       lockdep_is_held(&mddev->reconfig_mutex)));
 		clear_bit(RT_FLAG_RS_FROZEN, &rs->runtime_flags);
-		mddev_lock_nointr(mddev);
 		mddev->ro = 0;
 		mddev->in_sync = 0;
 		md_unfrozen_sync_thread(mddev);