Message ID | 20220602134509.16662-1-guoqing.jiang@linux.dev (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | md: only unlock mddev from action_store | expand |
On 2022-06-02 07:45, Guoqing Jiang wrote: > The 07reshape5intr test is broke because of below path. > > md_reap_sync_thread > -> mddev_unlock > -> md_unregister_thread(&mddev->sync_thread) > > And md_check_recovery is triggered by, > > mddev_unlock -> md_wakeup_thread(mddev->thread) > > then mddev->reshape_position is set to MaxSector in raid5_finish_reshape > since MD_RECOVERY_INTR is cleared in md_check_recovery, which means > feature_map is not set with MD_FEATURE_RESHAPE_ACTIVE and superblock's > reshape_position can't be updated accordingly. > > Since the bug which commit 8b48ec23cc51a ("md: don't unregister sync_thread > with reconfig_mutex held") fixed is related with action_store path, other > callers which reap sync_thread didn't need to be changed, let's > > 1. only unlock mddev in md_reap_sync_thread if caller is action_store, > so the parameter is renamed to reflect the change. > 2. save some contexts (MD_RECOVERY_INTR and reshape_position) since they > could be changed by other processes, then restore them after get lock > again. > > Fixes: 8b48ec23cc51a ("md: don't unregister sync_thread with reconfig_mutex held") > Reported-by: Logan Gunthorpe <logang@deltatee.com> > Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> > --- > I suppose the previous bug still can be fixed with the change, but it is > better to verify it. Donald, could you help to test the new code? > > Thanks, > Guoqing > > drivers/md/md.c | 24 ++++++++++++++++++------ > drivers/md/md.h | 2 +- > 2 files changed, 19 insertions(+), 7 deletions(-) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 5c8efef13881..3387260dd55b 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -6197,7 +6197,7 @@ static void __md_stop_writes(struct mddev *mddev) > flush_workqueue(md_misc_wq); > if (mddev->sync_thread) { > set_bit(MD_RECOVERY_INTR, &mddev->recovery); > - md_reap_sync_thread(mddev, true); > + md_reap_sync_thread(mddev, false); > } > > del_timer_sync(&mddev->safemode_timer); > @@ -9303,7 +9303,7 @@ void md_check_recovery(struct mddev *mddev) > * ->spare_active and clear saved_raid_disk > */ > set_bit(MD_RECOVERY_INTR, &mddev->recovery); > - md_reap_sync_thread(mddev, true); > + md_reap_sync_thread(mddev, false); > clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); > clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); > clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); > @@ -9338,7 +9338,7 @@ void md_check_recovery(struct mddev *mddev) > goto unlock; > } > if (mddev->sync_thread) { > - md_reap_sync_thread(mddev, true); > + md_reap_sync_thread(mddev, false); > goto unlock; > } > /* Set RUNNING before clearing NEEDED to avoid > @@ -9411,18 +9411,30 @@ void md_check_recovery(struct mddev *mddev) > } > EXPORT_SYMBOL(md_check_recovery); > > -void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held) > +void md_reap_sync_thread(struct mddev *mddev, bool unlock_mddev) > { > struct md_rdev *rdev; > sector_t old_dev_sectors = mddev->dev_sectors; > + sector_t old_reshape_position; > bool is_reshaped = false; > + bool is_interrupted = false; > > - if (reconfig_mutex_held) > + if (unlock_mddev) { > + old_reshape_position = mddev->reshape_position; > + if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) > + is_interrupted = true; > mddev_unlock(mddev); > + } > /* resync has finished, collect result */ > md_unregister_thread(&mddev->sync_thread); > - if (reconfig_mutex_held) > + if (unlock_mddev) { > mddev_lock_nointr(mddev); > + /* restore the previous flag and position */ > + mddev->reshape_position = old_reshape_position; > + if (is_interrupted) > + set_bit(MD_RECOVERY_INTR, &mddev->recovery); > + } Maybe instead of the ugly boolean argument we could pull md_unregister_thread() into all the callers and explicitly unlock in the single call site that needs it? Logan
On 6/2/22 3:45 PM, Guoqing Jiang wrote: > The 07reshape5intr test is broke because of below path. > > md_reap_sync_thread > -> mddev_unlock > -> md_unregister_thread(&mddev->sync_thread) > > And md_check_recovery is triggered by, > > mddev_unlock -> md_wakeup_thread(mddev->thread) > > then mddev->reshape_position is set to MaxSector in raid5_finish_reshape > since MD_RECOVERY_INTR is cleared in md_check_recovery, which means > feature_map is not set with MD_FEATURE_RESHAPE_ACTIVE and superblock's > reshape_position can't be updated accordingly. > > Since the bug which commit 8b48ec23cc51a ("md: don't unregister sync_thread > with reconfig_mutex held") fixed is related with action_store path, other > callers which reap sync_thread didn't need to be changed, let's > > 1. only unlock mddev in md_reap_sync_thread if caller is action_store, > so the parameter is renamed to reflect the change. > 2. save some contexts (MD_RECOVERY_INTR and reshape_position) since they > could be changed by other processes, then restore them after get lock > again. > > Fixes: 8b48ec23cc51a ("md: don't unregister sync_thread with reconfig_mutex held") > Reported-by: Logan Gunthorpe <logang@deltatee.com> > Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> > --- > I suppose the previous bug still can be fixed with the change, but it is > better to verify it. Donald, could you help to test the new code? Sure. But Probably not before next week and there is a holiday in Germany on Monday. Best Donald > > Thanks, > Guoqing > > drivers/md/md.c | 24 ++++++++++++++++++------ > drivers/md/md.h | 2 +- > 2 files changed, 19 insertions(+), 7 deletions(-) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 5c8efef13881..3387260dd55b 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -6197,7 +6197,7 @@ static void __md_stop_writes(struct mddev *mddev) > flush_workqueue(md_misc_wq); > if (mddev->sync_thread) { > set_bit(MD_RECOVERY_INTR, &mddev->recovery); > - md_reap_sync_thread(mddev, true); > + md_reap_sync_thread(mddev, false); > } > > del_timer_sync(&mddev->safemode_timer); > @@ -9303,7 +9303,7 @@ void md_check_recovery(struct mddev *mddev) > * ->spare_active and clear saved_raid_disk > */ > set_bit(MD_RECOVERY_INTR, &mddev->recovery); > - md_reap_sync_thread(mddev, true); > + md_reap_sync_thread(mddev, false); > clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); > clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); > clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); > @@ -9338,7 +9338,7 @@ void md_check_recovery(struct mddev *mddev) > goto unlock; > } > if (mddev->sync_thread) { > - md_reap_sync_thread(mddev, true); > + md_reap_sync_thread(mddev, false); > goto unlock; > } > /* Set RUNNING before clearing NEEDED to avoid > @@ -9411,18 +9411,30 @@ void md_check_recovery(struct mddev *mddev) > } > EXPORT_SYMBOL(md_check_recovery); > > -void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held) > +void md_reap_sync_thread(struct mddev *mddev, bool unlock_mddev) > { > struct md_rdev *rdev; > sector_t old_dev_sectors = mddev->dev_sectors; > + sector_t old_reshape_position; > bool is_reshaped = false; > + bool is_interrupted = false; > > - if (reconfig_mutex_held) > + if (unlock_mddev) { > + old_reshape_position = mddev->reshape_position; > + if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) > + is_interrupted = true; > mddev_unlock(mddev); > + } > /* resync has finished, collect result */ > md_unregister_thread(&mddev->sync_thread); > - if (reconfig_mutex_held) > + if (unlock_mddev) { > mddev_lock_nointr(mddev); > + /* restore the previous flag and position */ > + mddev->reshape_position = old_reshape_position; > + if (is_interrupted) > + set_bit(MD_RECOVERY_INTR, &mddev->recovery); > + } > + > if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) && > !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) && > mddev->degraded != mddev->raid_disks) { > diff --git a/drivers/md/md.h b/drivers/md/md.h > index 82465a4b1588..3e3ff3b20e81 100644 > --- a/drivers/md/md.h > +++ b/drivers/md/md.h > @@ -719,7 +719,7 @@ extern struct md_thread *md_register_thread( > extern void md_unregister_thread(struct md_thread **threadp); > extern void md_wakeup_thread(struct md_thread *thread); > extern void md_check_recovery(struct mddev *mddev); > -extern void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held); > +extern void md_reap_sync_thread(struct mddev *mddev, bool unlock_mddev); > extern int mddev_init_writes_pending(struct mddev *mddev); > extern bool md_write_start(struct mddev *mddev, struct bio *bi); > extern void md_write_inc(struct mddev *mddev, struct bio *bi); >
On 6/3/22 2:03 AM, Logan Gunthorpe wrote: > > > On 2022-06-02 07:45, Guoqing Jiang wrote: >> The 07reshape5intr test is broke because of below path. >> >> md_reap_sync_thread >> -> mddev_unlock >> -> md_unregister_thread(&mddev->sync_thread) >> >> And md_check_recovery is triggered by, >> >> mddev_unlock -> md_wakeup_thread(mddev->thread) >> >> then mddev->reshape_position is set to MaxSector in raid5_finish_reshape >> since MD_RECOVERY_INTR is cleared in md_check_recovery, which means >> feature_map is not set with MD_FEATURE_RESHAPE_ACTIVE and superblock's >> reshape_position can't be updated accordingly. >> >> Since the bug which commit 8b48ec23cc51a ("md: don't unregister sync_thread >> with reconfig_mutex held") fixed is related with action_store path, other >> callers which reap sync_thread didn't need to be changed, let's >> >> 1. only unlock mddev in md_reap_sync_thread if caller is action_store, >> so the parameter is renamed to reflect the change. >> 2. save some contexts (MD_RECOVERY_INTR and reshape_position) since they >> could be changed by other processes, then restore them after get lock >> again. >> >> Fixes: 8b48ec23cc51a ("md: don't unregister sync_thread with reconfig_mutex held") >> Reported-by: Logan Gunthorpe <logang@deltatee.com> >> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> >> --- >> I suppose the previous bug still can be fixed with the change, but it is >> better to verify it. Donald, could you help to test the new code? >> >> Thanks, >> Guoqing >> >> drivers/md/md.c | 24 ++++++++++++++++++------ >> drivers/md/md.h | 2 +- >> 2 files changed, 19 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/md/md.c b/drivers/md/md.c >> index 5c8efef13881..3387260dd55b 100644 >> --- a/drivers/md/md.c >> +++ b/drivers/md/md.c >> @@ -6197,7 +6197,7 @@ static void __md_stop_writes(struct mddev *mddev) >> flush_workqueue(md_misc_wq); >> if (mddev->sync_thread) { >> set_bit(MD_RECOVERY_INTR, &mddev->recovery); >> - md_reap_sync_thread(mddev, true); >> + md_reap_sync_thread(mddev, false); >> } >> >> del_timer_sync(&mddev->safemode_timer); >> @@ -9303,7 +9303,7 @@ void md_check_recovery(struct mddev *mddev) >> * ->spare_active and clear saved_raid_disk >> */ >> set_bit(MD_RECOVERY_INTR, &mddev->recovery); >> - md_reap_sync_thread(mddev, true); >> + md_reap_sync_thread(mddev, false); >> clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); >> clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); >> clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); >> @@ -9338,7 +9338,7 @@ void md_check_recovery(struct mddev *mddev) >> goto unlock; >> } >> if (mddev->sync_thread) { >> - md_reap_sync_thread(mddev, true); >> + md_reap_sync_thread(mddev, false); >> goto unlock; >> } >> /* Set RUNNING before clearing NEEDED to avoid >> @@ -9411,18 +9411,30 @@ void md_check_recovery(struct mddev *mddev) >> } >> EXPORT_SYMBOL(md_check_recovery); >> >> -void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held) >> +void md_reap_sync_thread(struct mddev *mddev, bool unlock_mddev) >> { >> struct md_rdev *rdev; >> sector_t old_dev_sectors = mddev->dev_sectors; >> + sector_t old_reshape_position; >> bool is_reshaped = false; >> + bool is_interrupted = false; >> >> - if (reconfig_mutex_held) >> + if (unlock_mddev) { >> + old_reshape_position = mddev->reshape_position; >> + if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) >> + is_interrupted = true; Hmm, this checking is not needed as it is always true from action_store. >> mddev_unlock(mddev); >> + } >> /* resync has finished, collect result */ >> md_unregister_thread(&mddev->sync_thread); >> - if (reconfig_mutex_held) >> + if (unlock_mddev) { >> mddev_lock_nointr(mddev); >> + /* restore the previous flag and position */ >> + mddev->reshape_position = old_reshape_position; >> + if (is_interrupted) >> + set_bit(MD_RECOVERY_INTR, &mddev->recovery); So is this one. >> + } > Maybe instead of the ugly boolean argument we could pull > md_unregister_thread() into all the callers and explicitly unlock in the > single call site that needs it? Sounds good, let me revert previous commit then pull it. Thanks, Guoqing
On 6/3/22 2:03 AM, Logan Gunthorpe wrote: > > > On 2022-06-02 07:45, Guoqing Jiang wrote: >> The 07reshape5intr test is broke because of below path. >> >> md_reap_sync_thread >> -> mddev_unlock >> -> md_unregister_thread(&mddev->sync_thread) >> >> And md_check_recovery is triggered by, >> >> mddev_unlock -> md_wakeup_thread(mddev->thread) >> >> then mddev->reshape_position is set to MaxSector in raid5_finish_reshape >> since MD_RECOVERY_INTR is cleared in md_check_recovery, which means >> feature_map is not set with MD_FEATURE_RESHAPE_ACTIVE and superblock's >> reshape_position can't be updated accordingly. >> >> Since the bug which commit 8b48ec23cc51a ("md: don't unregister sync_thread >> with reconfig_mutex held") fixed is related with action_store path, other >> callers which reap sync_thread didn't need to be changed, let's >> >> 1. only unlock mddev in md_reap_sync_thread if caller is action_store, >> so the parameter is renamed to reflect the change. >> 2. save some contexts (MD_RECOVERY_INTR and reshape_position) since they >> could be changed by other processes, then restore them after get lock >> again. >> >> Fixes: 8b48ec23cc51a ("md: don't unregister sync_thread with reconfig_mutex held") >> Reported-by: Logan Gunthorpe <logang@deltatee.com> >> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> >> --- >> I suppose the previous bug still can be fixed with the change, but it is >> better to verify it. Donald, could you help to test the new code? >> >> Thanks, >> Guoqing >> >> drivers/md/md.c | 24 ++++++++++++++++++------ >> drivers/md/md.h | 2 +- >> 2 files changed, 19 insertions(+), 7 deletions(-) >> >> diff --git a/drivers/md/md.c b/drivers/md/md.c >> index 5c8efef13881..3387260dd55b 100644 >> --- a/drivers/md/md.c >> +++ b/drivers/md/md.c >> @@ -6197,7 +6197,7 @@ static void __md_stop_writes(struct mddev *mddev) >> flush_workqueue(md_misc_wq); >> if (mddev->sync_thread) { >> set_bit(MD_RECOVERY_INTR, &mddev->recovery); >> - md_reap_sync_thread(mddev, true); >> + md_reap_sync_thread(mddev, false); >> } >> >> del_timer_sync(&mddev->safemode_timer); >> @@ -9303,7 +9303,7 @@ void md_check_recovery(struct mddev *mddev) >> * ->spare_active and clear saved_raid_disk >> */ >> set_bit(MD_RECOVERY_INTR, &mddev->recovery); >> - md_reap_sync_thread(mddev, true); >> + md_reap_sync_thread(mddev, false); >> clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); >> clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); >> clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); >> @@ -9338,7 +9338,7 @@ void md_check_recovery(struct mddev *mddev) >> goto unlock; >> } >> if (mddev->sync_thread) { >> - md_reap_sync_thread(mddev, true); >> + md_reap_sync_thread(mddev, false); >> goto unlock; >> } >> /* Set RUNNING before clearing NEEDED to avoid >> @@ -9411,18 +9411,30 @@ void md_check_recovery(struct mddev *mddev) >> } >> EXPORT_SYMBOL(md_check_recovery); >> >> -void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held) >> +void md_reap_sync_thread(struct mddev *mddev, bool unlock_mddev) >> { >> struct md_rdev *rdev; >> sector_t old_dev_sectors = mddev->dev_sectors; >> + sector_t old_reshape_position; >> bool is_reshaped = false; >> + bool is_interrupted = false; >> >> - if (reconfig_mutex_held) >> + if (unlock_mddev) { >> + old_reshape_position = mddev->reshape_position; >> + if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) >> + is_interrupted = true; >> mddev_unlock(mddev); >> + } >> /* resync has finished, collect result */ >> md_unregister_thread(&mddev->sync_thread); >> - if (reconfig_mutex_held) >> + if (unlock_mddev) { >> mddev_lock_nointr(mddev); >> + /* restore the previous flag and position */ >> + mddev->reshape_position = old_reshape_position; >> + if (is_interrupted) >> + set_bit(MD_RECOVERY_INTR, &mddev->recovery); >> + } > Maybe instead of the ugly boolean argument we could pull > md_unregister_thread() into all the callers and explicitly unlock in the > single call site that needs it? After move "md_unregister_thread(&mddev->sync_thread)", then we need to rename md_reap_sync_thread given it doesn't unregister sync_thread, any suggestion about the new name? md_behind_reap_sync_thread? Thanks, Guoqing
On Mon, Jun 6, 2022 at 11:08 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote: > > > > >> + } > > Maybe instead of the ugly boolean argument we could pull > > md_unregister_thread() into all the callers and explicitly unlock in the > > single call site that needs it? > > Sounds good, let me revert previous commit then pull it. > Hi all Now md_reap_sync_thread is called by __md_stop_writes, action_store and md_check_recovery. If we move md_unregister_thread out of md_reap_sync_thread and unlock reconfig_mutex before calling md_unregister_thread, it means we break the atomic action for these three functions mentioned above. Not sure if it can introduce other problems, especially for the change of md_check_recovery. How about suspend I/O when changing the sync action? It should not hurt the performance, because it only interrupts the sync action and flush the in memory stripes. For this way, it needs to revert 7e6ba434cc60 (md: don't unregister sync_thread with reconfig_mutex held) and changes like this: diff --git a/drivers/md/md.c b/drivers/md/md.c index ce9d2845d3ac..af28cdeaf7e1 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -4827,12 +4827,14 @@ action_store(struct mddev *mddev, const char *page, size_t len) clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && mddev_lock(mddev) == 0) { + mddev_suspend(mddev); if (work_pending(&mddev->del_work)) flush_workqueue(md_misc_wq); if (mddev->sync_thread) { set_bit(MD_RECOVERY_INTR, &mddev->recovery); md_reap_sync_thread(mddev); } + mddev_resume(mddev); mddev_unlock(mddev); } } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) mddev_suspend can stop the I/O and the superblock can be updated too. Because MD_ALLOW_SB_UPDATE is set. So raid5d can go on handling stripes. Best Regards Xiao
On 6/6/22 4:39 PM, Xiao Ni wrote: > On Mon, Jun 6, 2022 at 11:08 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote: >> >> >>>> + } >>> Maybe instead of the ugly boolean argument we could pull >>> md_unregister_thread() into all the callers and explicitly unlock in the >>> single call site that needs it? >> Sounds good, let me revert previous commit then pull it. >> > Hi all > > Now md_reap_sync_thread is called by __md_stop_writes, action_store > and md_check_recovery. > If we move md_unregister_thread out of md_reap_sync_thread and unlock > reconfig_mutex before > calling md_unregister_thread, it means we break the atomic action for > these three functions mentioned > above. No, only action_store need to be changed, other callers keep the original behavior. > Not sure if it can introduce other problems, especially for the > change of md_check_recovery. > > How about suspend I/O when changing the sync action? It should not > hurt the performance, because > it only interrupts the sync action and flush the in memory stripes. > For this way, it needs to revert > 7e6ba434cc60 (md: don't unregister sync_thread with reconfig_mutex > held) and changes like this: > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index ce9d2845d3ac..af28cdeaf7e1 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -4827,12 +4827,14 @@ action_store(struct mddev *mddev, const char > *page, size_t len) > clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); > if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && > mddev_lock(mddev) == 0) { > + mddev_suspend(mddev); > if (work_pending(&mddev->del_work)) > flush_workqueue(md_misc_wq); > if (mddev->sync_thread) { > set_bit(MD_RECOVERY_INTR, &mddev->recovery); > md_reap_sync_thread(mddev); > } > + mddev_resume(mddev); > mddev_unlock(mddev); > } > } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) The action actually means sync action which are for internal IO instead of external IO, I suppose the semantic is different with above change. And there are lots of sync/locking actions in suspend path, I am not sure it will cause other locking issues, you may need to investigate it. Thanks, Guoqing
On Mon, Jun 6, 2022 at 5:00 PM Guoqing Jiang <guoqing.jiang@linux.dev> wrote: > > > > On 6/6/22 4:39 PM, Xiao Ni wrote: > > On Mon, Jun 6, 2022 at 11:08 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote: > >> > >> > >>>> + } > >>> Maybe instead of the ugly boolean argument we could pull > >>> md_unregister_thread() into all the callers and explicitly unlock in the > >>> single call site that needs it? > >> Sounds good, let me revert previous commit then pull it. > >> > > Hi all > > > > Now md_reap_sync_thread is called by __md_stop_writes, action_store > > and md_check_recovery. > > If we move md_unregister_thread out of md_reap_sync_thread and unlock > > reconfig_mutex before > > calling md_unregister_thread, it means we break the atomic action for > > these three functions mentioned > > above. > > No, only action_store need to be changed, other callers keep the original > behavior. Hi Guoqing Thanks for pointing out this. > > > Not sure if it can introduce other problems, especially for the > > change of md_check_recovery. > > > > How about suspend I/O when changing the sync action? It should not > > hurt the performance, because > > it only interrupts the sync action and flush the in memory stripes. > > For this way, it needs to revert > > 7e6ba434cc60 (md: don't unregister sync_thread with reconfig_mutex > > held) and changes like this: > > > > diff --git a/drivers/md/md.c b/drivers/md/md.c > > index ce9d2845d3ac..af28cdeaf7e1 100644 > > --- a/drivers/md/md.c > > +++ b/drivers/md/md.c > > @@ -4827,12 +4827,14 @@ action_store(struct mddev *mddev, const char > > *page, size_t len) > > clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); > > if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && > > mddev_lock(mddev) == 0) { > > + mddev_suspend(mddev); > > if (work_pending(&mddev->del_work)) > > flush_workqueue(md_misc_wq); > > if (mddev->sync_thread) { > > set_bit(MD_RECOVERY_INTR, &mddev->recovery); > > md_reap_sync_thread(mddev); > > } > > + mddev_resume(mddev); > > mddev_unlock(mddev); > > } > > } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) > > The action actually means sync action which are for internal IO instead > of external IO, I suppose the semantic is different with above change. The original problem should be i/o happen when interrupting the sync thread? So the external i/o calls md_write_start and sets MD_SB_CHANGE_PENDING. Then raid5d->md_check_recovery can't update superblock and handle internal stripes. So the `echo idle` action is stuck. > > And there are lots of sync/locking actions in suspend path, I am not > sure it will cause other locking issues, you may need to investigate it. Yes, mddev_supsend needs those sync methods to keep the raid in a quiescent state. First, it needs to get reconfig_mutex before calling mddev_supsend. So it can avoid racing with all the actions that want to change the raid. Then it waits mddev->active_io to 0 which means all submit bio processes stop submitting io. Then it waits until all internal i/os finish by pers->quiesce. Last it waits until the superblock is updated. From the logic, it looks safe. By the way, the patch 35bfc52187f (md: allow metadata update while suspending) which introduces MD_UPDATING_SB and MD_UPDATING_SB fixes the similar deadlock problem. So in this problem, it looks like mddev_suspend is a good choice. As you mentioned, it needs to consider and do tests more because of the complex sync/locking actions. Best Regards Xiao
On 6/6/22 5:36 PM, Xiao Ni wrote: >>> @@ -4827,12 +4827,14 @@ action_store(struct mddev *mddev, const char >>> *page, size_t len) >>> clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); >>> if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && >>> mddev_lock(mddev) == 0) { >>> + mddev_suspend(mddev); >>> if (work_pending(&mddev->del_work)) >>> flush_workqueue(md_misc_wq); >>> if (mddev->sync_thread) { >>> set_bit(MD_RECOVERY_INTR, &mddev->recovery); >>> md_reap_sync_thread(mddev); >>> } >>> + mddev_resume(mddev); >>> mddev_unlock(mddev); >>> } >>> } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) >> The action actually means sync action which are for internal IO instead >> of external IO, I suppose the semantic is different with above change. > The original problem should be i/o happen when interrupting the sync thread? > So the external i/o calls md_write_start and sets MD_SB_CHANGE_PENDING. > Then raid5d->md_check_recovery can't update superblock and handle internal > stripes. So the `echo idle` action is stuck. My point is action_store shouldn't disturb external IO, it should only deal with sync IO and relevant stuffs as the function is for sync_action node, no? Thanks, Guoqing
On Mon, Jun 6, 2022 at 5:55 PM Guoqing Jiang <guoqing.jiang@linux.dev> wrote: > > > > On 6/6/22 5:36 PM, Xiao Ni wrote: > >>> @@ -4827,12 +4827,14 @@ action_store(struct mddev *mddev, const char > >>> *page, size_t len) > >>> clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); > >>> if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) && > >>> mddev_lock(mddev) == 0) { > >>> + mddev_suspend(mddev); > >>> if (work_pending(&mddev->del_work)) > >>> flush_workqueue(md_misc_wq); > >>> if (mddev->sync_thread) { > >>> set_bit(MD_RECOVERY_INTR, &mddev->recovery); > >>> md_reap_sync_thread(mddev); > >>> } > >>> + mddev_resume(mddev); > >>> mddev_unlock(mddev); > >>> } > >>> } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) > >> The action actually means sync action which are for internal IO instead > >> of external IO, I suppose the semantic is different with above change. > > The original problem should be i/o happen when interrupting the sync thread? > > So the external i/o calls md_write_start and sets MD_SB_CHANGE_PENDING. > > Then raid5d->md_check_recovery can't update superblock and handle internal > > stripes. So the `echo idle` action is stuck. > > My point is action_store shouldn't disturb external IO, it should only > deal with > sync IO and relevant stuffs as the function is for sync_action node, no? Yes, it's a change that was mentioned above. It depends on which way we want to go. In some sysfs file store functions, it also calls mddev_suspend. For some situations, it needs to choose if we need to stop the i/o temporarily. Anyway, it's only a possible method I want to discuss. It's very good for me to dig into this problem. Now I have a better understanding of those codes :) Best Regards Xiao
On 2022-06-05 21:29, Guoqing Jiang wrote: > > > On 6/3/22 2:03 AM, Logan Gunthorpe wrote: >> >> >> On 2022-06-02 07:45, Guoqing Jiang wrote: >>> The 07reshape5intr test is broke because of below path. >>> >>> md_reap_sync_thread >>> -> mddev_unlock >>> -> md_unregister_thread(&mddev->sync_thread) >>> >>> And md_check_recovery is triggered by, >>> >>> mddev_unlock -> md_wakeup_thread(mddev->thread) >>> >>> then mddev->reshape_position is set to MaxSector in raid5_finish_reshape >>> since MD_RECOVERY_INTR is cleared in md_check_recovery, which means >>> feature_map is not set with MD_FEATURE_RESHAPE_ACTIVE and superblock's >>> reshape_position can't be updated accordingly. >>> >>> Since the bug which commit 8b48ec23cc51a ("md: don't unregister sync_thread >>> with reconfig_mutex held") fixed is related with action_store path, other >>> callers which reap sync_thread didn't need to be changed, let's >>> >>> 1. only unlock mddev in md_reap_sync_thread if caller is action_store, >>> so the parameter is renamed to reflect the change. >>> 2. save some contexts (MD_RECOVERY_INTR and reshape_position) since they >>> could be changed by other processes, then restore them after get lock >>> again. >>> >>> Fixes: 8b48ec23cc51a ("md: don't unregister sync_thread with reconfig_mutex held") >>> Reported-by: Logan Gunthorpe <logang@deltatee.com> >>> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> >>> --- >>> I suppose the previous bug still can be fixed with the change, but it is >>> better to verify it. Donald, could you help to test the new code? >>> >>> Thanks, >>> Guoqing >>> >>> drivers/md/md.c | 24 ++++++++++++++++++------ >>> drivers/md/md.h | 2 +- >>> 2 files changed, 19 insertions(+), 7 deletions(-) >>> >>> diff --git a/drivers/md/md.c b/drivers/md/md.c >>> index 5c8efef13881..3387260dd55b 100644 >>> --- a/drivers/md/md.c >>> +++ b/drivers/md/md.c >>> @@ -6197,7 +6197,7 @@ static void __md_stop_writes(struct mddev *mddev) >>> flush_workqueue(md_misc_wq); >>> if (mddev->sync_thread) { >>> set_bit(MD_RECOVERY_INTR, &mddev->recovery); >>> - md_reap_sync_thread(mddev, true); >>> + md_reap_sync_thread(mddev, false); >>> } >>> >>> del_timer_sync(&mddev->safemode_timer); >>> @@ -9303,7 +9303,7 @@ void md_check_recovery(struct mddev *mddev) >>> * ->spare_active and clear saved_raid_disk >>> */ >>> set_bit(MD_RECOVERY_INTR, &mddev->recovery); >>> - md_reap_sync_thread(mddev, true); >>> + md_reap_sync_thread(mddev, false); >>> clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); >>> clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); >>> clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); >>> @@ -9338,7 +9338,7 @@ void md_check_recovery(struct mddev *mddev) >>> goto unlock; >>> } >>> if (mddev->sync_thread) { >>> - md_reap_sync_thread(mddev, true); >>> + md_reap_sync_thread(mddev, false); >>> goto unlock; >>> } >>> /* Set RUNNING before clearing NEEDED to avoid >>> @@ -9411,18 +9411,30 @@ void md_check_recovery(struct mddev *mddev) >>> } >>> EXPORT_SYMBOL(md_check_recovery); >>> >>> -void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held) >>> +void md_reap_sync_thread(struct mddev *mddev, bool unlock_mddev) >>> { >>> struct md_rdev *rdev; >>> sector_t old_dev_sectors = mddev->dev_sectors; >>> + sector_t old_reshape_position; >>> bool is_reshaped = false; >>> + bool is_interrupted = false; >>> >>> - if (reconfig_mutex_held) >>> + if (unlock_mddev) { >>> + old_reshape_position = mddev->reshape_position; >>> + if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) >>> + is_interrupted = true; >>> mddev_unlock(mddev); >>> + } >>> /* resync has finished, collect result */ >>> md_unregister_thread(&mddev->sync_thread); >>> - if (reconfig_mutex_held) >>> + if (unlock_mddev) { >>> mddev_lock_nointr(mddev); >>> + /* restore the previous flag and position */ >>> + mddev->reshape_position = old_reshape_position; >>> + if (is_interrupted) >>> + set_bit(MD_RECOVERY_INTR, &mddev->recovery); >>> + } >> Maybe instead of the ugly boolean argument we could pull >> md_unregister_thread() into all the callers and explicitly unlock in the >> single call site that needs it? > > After move "md_unregister_thread(&mddev->sync_thread)", then we need to > rename md_reap_sync_thread given it doesn't unregister sync_thread, any > suggestion about the new name? md_behind_reap_sync_thread? I don't like the "behind"... Which would be a name suggesting when the function should be called, not what the function does. I'd maybe go with something like md_cleanup_sync_thread() Logan
On 6/6/22 11:48 PM, Logan Gunthorpe wrote: >> After move "md_unregister_thread(&mddev->sync_thread)", then we need to >> rename md_reap_sync_thread given it doesn't unregister sync_thread, any >> suggestion about the new name? md_behind_reap_sync_thread? > I don't like the "behind"... Which would be a name suggesting when the > function should be called, not what the function does. Right, naming is hard. > I'd maybe go with something like md_cleanup_sync_thread() It is confusing since it doesn't related with sync_thread anymore after the pull. I will keep the name unchanged for now. Thanks, Guoqing
diff --git a/drivers/md/md.c b/drivers/md/md.c index 5c8efef13881..3387260dd55b 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -6197,7 +6197,7 @@ static void __md_stop_writes(struct mddev *mddev) flush_workqueue(md_misc_wq); if (mddev->sync_thread) { set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_reap_sync_thread(mddev, true); + md_reap_sync_thread(mddev, false); } del_timer_sync(&mddev->safemode_timer); @@ -9303,7 +9303,7 @@ void md_check_recovery(struct mddev *mddev) * ->spare_active and clear saved_raid_disk */ set_bit(MD_RECOVERY_INTR, &mddev->recovery); - md_reap_sync_thread(mddev, true); + md_reap_sync_thread(mddev, false); clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery); clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery); clear_bit(MD_SB_CHANGE_PENDING, &mddev->sb_flags); @@ -9338,7 +9338,7 @@ void md_check_recovery(struct mddev *mddev) goto unlock; } if (mddev->sync_thread) { - md_reap_sync_thread(mddev, true); + md_reap_sync_thread(mddev, false); goto unlock; } /* Set RUNNING before clearing NEEDED to avoid @@ -9411,18 +9411,30 @@ void md_check_recovery(struct mddev *mddev) } EXPORT_SYMBOL(md_check_recovery); -void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held) +void md_reap_sync_thread(struct mddev *mddev, bool unlock_mddev) { struct md_rdev *rdev; sector_t old_dev_sectors = mddev->dev_sectors; + sector_t old_reshape_position; bool is_reshaped = false; + bool is_interrupted = false; - if (reconfig_mutex_held) + if (unlock_mddev) { + old_reshape_position = mddev->reshape_position; + if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) + is_interrupted = true; mddev_unlock(mddev); + } /* resync has finished, collect result */ md_unregister_thread(&mddev->sync_thread); - if (reconfig_mutex_held) + if (unlock_mddev) { mddev_lock_nointr(mddev); + /* restore the previous flag and position */ + mddev->reshape_position = old_reshape_position; + if (is_interrupted) + set_bit(MD_RECOVERY_INTR, &mddev->recovery); + } + if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) && !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) && mddev->degraded != mddev->raid_disks) { diff --git a/drivers/md/md.h b/drivers/md/md.h index 82465a4b1588..3e3ff3b20e81 100644 --- a/drivers/md/md.h +++ b/drivers/md/md.h @@ -719,7 +719,7 @@ extern struct md_thread *md_register_thread( extern void md_unregister_thread(struct md_thread **threadp); extern void md_wakeup_thread(struct md_thread *thread); extern void md_check_recovery(struct mddev *mddev); -extern void md_reap_sync_thread(struct mddev *mddev, bool reconfig_mutex_held); +extern void md_reap_sync_thread(struct mddev *mddev, bool unlock_mddev); extern int mddev_init_writes_pending(struct mddev *mddev); extern bool md_write_start(struct mddev *mddev, struct bio *bi); extern void md_write_inc(struct mddev *mddev, struct bio *bi);
The 07reshape5intr test is broke because of below path. md_reap_sync_thread -> mddev_unlock -> md_unregister_thread(&mddev->sync_thread) And md_check_recovery is triggered by, mddev_unlock -> md_wakeup_thread(mddev->thread) then mddev->reshape_position is set to MaxSector in raid5_finish_reshape since MD_RECOVERY_INTR is cleared in md_check_recovery, which means feature_map is not set with MD_FEATURE_RESHAPE_ACTIVE and superblock's reshape_position can't be updated accordingly. Since the bug which commit 8b48ec23cc51a ("md: don't unregister sync_thread with reconfig_mutex held") fixed is related with action_store path, other callers which reap sync_thread didn't need to be changed, let's 1. only unlock mddev in md_reap_sync_thread if caller is action_store, so the parameter is renamed to reflect the change. 2. save some contexts (MD_RECOVERY_INTR and reshape_position) since they could be changed by other processes, then restore them after get lock again. Fixes: 8b48ec23cc51a ("md: don't unregister sync_thread with reconfig_mutex held") Reported-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> --- I suppose the previous bug still can be fixed with the change, but it is better to verify it. Donald, could you help to test the new code? Thanks, Guoqing drivers/md/md.c | 24 ++++++++++++++++++------ drivers/md/md.h | 2 +- 2 files changed, 19 insertions(+), 7 deletions(-)