Message ID | 20241105075733.66101-1-xni@redhat.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Series | [RFC,1/1] md: Use pers->quiesce in mddev_suspend | expand |
Hi, 在 2024/11/05 15:57, Xiao Ni 写道: > One customer reports a bug: raid5 is hung when changing thread cnt > while resync is running. The stripes are all in conf->handle_list > and new threads can't handle them. > > Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes > pers->quiesce from mddev_suspend/resume, then we can't guarantee sync > requests finish in suspend operation. One personality knows itself the > best. So pers->quiesce is a proper way to let personality quiesce. Do you mean that other than normal IO, raid5 expects sync IO to be done as well by mddev_suspend()? If so, we'd better add some comments as well. Thanks, Kuai > > Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()") > Signed-off-by: Xiao Ni <xni@redhat.com> > --- > drivers/md/md.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 67108c397c5a..7409ecb2df68 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -482,6 +482,9 @@ int mddev_suspend(struct mddev *mddev, bool interruptible) > return err; > } > > + if (mddev->pers) > + mddev->pers->quiesce(mddev, 1); > + > /* > * For raid456, io might be waiting for reshape to make progress, > * allow new reshape to start while waiting for io to be done to > @@ -514,6 +517,9 @@ static void __mddev_resume(struct mddev *mddev, bool recovery_needed) > percpu_ref_resurrect(&mddev->active_io); > wake_up(&mddev->sb_wait); > > + if (mddev->pers) > + mddev->pers->quiesce(mddev, 0); > + > if (recovery_needed) > set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); > md_wakeup_thread(mddev->thread); >
On Tue, 5 Nov 2024 15:57:33 +0800 Xiao Ni <xni@redhat.com> wrote: > One customer reports a bug: raid5 is hung when changing thread cnt > while resync is running. The stripes are all in conf->handle_list > and new threads can't handle them. Is issue fixed with this patch? Is is missing here :) > > Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes > pers->quiesce from mddev_suspend/resume, then we can't guarantee sync > requests finish in suspend operation. One personality knows itself the > best. So pers->quiesce is a proper way to let personality quiesce. > > Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()") > Signed-off-by: Xiao Ni <xni@redhat.com> > --- > drivers/md/md.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 67108c397c5a..7409ecb2df68 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -482,6 +482,9 @@ int mddev_suspend(struct mddev *mddev, bool interruptible) > return err; > } > > + if (mddev->pers) > + mddev->pers->quiesce(mddev, 1); > + Shouldn't it be implemented as below? According to b39f35ebe86d, some levels are not implementing this? + if (mddev->pers && mddev->pers->quiesce) + mddev->pers->quiesce(mddev, 1); Is it reproducible with upstream kernel? Thanks, Mariusz
Hi, 在 2024/11/05 16:16, Mariusz Tkaczyk 写道: > On Tue, 5 Nov 2024 15:57:33 +0800 > Xiao Ni <xni@redhat.com> wrote: > >> One customer reports a bug: raid5 is hung when changing thread cnt >> while resync is running. The stripes are all in conf->handle_list >> and new threads can't handle them. > > Is issue fixed with this patch? Is is missing here :) >> >> Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes >> pers->quiesce from mddev_suspend/resume, then we can't guarantee sync >> requests finish in suspend operation. One personality knows itself the >> best. So pers->quiesce is a proper way to let personality quiesce. > >> >> Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()") >> Signed-off-by: Xiao Ni <xni@redhat.com> >> --- >> drivers/md/md.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/drivers/md/md.c b/drivers/md/md.c >> index 67108c397c5a..7409ecb2df68 100644 >> --- a/drivers/md/md.c >> +++ b/drivers/md/md.c >> @@ -482,6 +482,9 @@ int mddev_suspend(struct mddev *mddev, bool interruptible) >> return err; >> } >> >> + if (mddev->pers) >> + mddev->pers->quiesce(mddev, 1); >> + > > Shouldn't it be implemented as below? According to b39f35ebe86d, some levels are > not implementing this? > > + if (mddev->pers && mddev->pers->quiesce) > + mddev->pers->quiesce(mddev, 1); It's fine, the fops is never NULL, just some levels points to an empty function. The tricky part here is that accessing "mddev->pers" is not safe here, 'reconfig_mutex' is not held in mddev_suspend(), and can concurrent with, for example, change levels. :( Thanks, Kuai > > Is it reproducible with upstream kernel? > > Thanks, > Mariusz > > . >
On Tue, Nov 5, 2024 at 4:14 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > Hi, > > 在 2024/11/05 15:57, Xiao Ni 写道: > > One customer reports a bug: raid5 is hung when changing thread cnt > > while resync is running. The stripes are all in conf->handle_list > > and new threads can't handle them. > > > > Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes > > pers->quiesce from mddev_suspend/resume, then we can't guarantee sync > > requests finish in suspend operation. One personality knows itself the > > best. So pers->quiesce is a proper way to let personality quiesce. > > Do you mean that other than normal IO, raid5 expects sync IO to be done > as well by mddev_suspend()? If so, we'd better add some comments as > well. Hi Kuai Yes, before patch b39f35ebe86d, mddev suspend can guarantee all io (normal io from filesystem and sync requests) finish. And raid5 conf->handle_list have normal io and sync io, so it should wait all these io finish in suspend operation. I'll add more comments. Regards Xiao > > Thanks, > Kuai > > > > > Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()") > > Signed-off-by: Xiao Ni <xni@redhat.com> > > --- > > drivers/md/md.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/drivers/md/md.c b/drivers/md/md.c > > index 67108c397c5a..7409ecb2df68 100644 > > --- a/drivers/md/md.c > > +++ b/drivers/md/md.c > > @@ -482,6 +482,9 @@ int mddev_suspend(struct mddev *mddev, bool interruptible) > > return err; > > } > > > > + if (mddev->pers) > > + mddev->pers->quiesce(mddev, 1); > > + > > /* > > * For raid456, io might be waiting for reshape to make progress, > > * allow new reshape to start while waiting for io to be done to > > @@ -514,6 +517,9 @@ static void __mddev_resume(struct mddev *mddev, bool recovery_needed) > > percpu_ref_resurrect(&mddev->active_io); > > wake_up(&mddev->sb_wait); > > > > + if (mddev->pers) > > + mddev->pers->quiesce(mddev, 0); > > + > > if (recovery_needed) > > set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); > > md_wakeup_thread(mddev->thread); > > >
On Tue, Nov 5, 2024 at 4:29 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > Hi, > > 在 2024/11/05 16:16, Mariusz Tkaczyk 写道: > > On Tue, 5 Nov 2024 15:57:33 +0800 > > Xiao Ni <xni@redhat.com> wrote: > > > >> One customer reports a bug: raid5 is hung when changing thread cnt > >> while resync is running. The stripes are all in conf->handle_list > >> and new threads can't handle them. > > > > Is issue fixed with this patch? Is is missing here :) > >> > >> Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes > >> pers->quiesce from mddev_suspend/resume, then we can't guarantee sync > >> requests finish in suspend operation. One personality knows itself the > >> best. So pers->quiesce is a proper way to let personality quiesce. > > > >> > >> Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()") > >> Signed-off-by: Xiao Ni <xni@redhat.com> > >> --- > >> drivers/md/md.c | 6 ++++++ > >> 1 file changed, 6 insertions(+) > >> > >> diff --git a/drivers/md/md.c b/drivers/md/md.c > >> index 67108c397c5a..7409ecb2df68 100644 > >> --- a/drivers/md/md.c > >> +++ b/drivers/md/md.c > >> @@ -482,6 +482,9 @@ int mddev_suspend(struct mddev *mddev, bool interruptible) > >> return err; > >> } > >> > >> + if (mddev->pers) > >> + mddev->pers->quiesce(mddev, 1); > >> + > > > > Shouldn't it be implemented as below? According to b39f35ebe86d, some levels are > > not implementing this? > > > > + if (mddev->pers && mddev->pers->quiesce) > > + mddev->pers->quiesce(mddev, 1); > > It's fine, the fops is never NULL, just some levels points to an empty > function. > > The tricky part here is that accessing "mddev->pers" is not safe here, > 'reconfig_mutex' is not held in mddev_suspend(), and can concurrent > with, for example, change levels. :( Hi Kuai Now mddev->suspended is protected by mddev->suspend_mutex. It should can avoid the problem you mentioned above? level_store calls mddev_suspend and mddev->suspended is set to mddev->suspended+1. So other paths will return because mddev->suspended is not 0. Regards Xiao > > Thanks, > Kuai > > > > > Is it reproducible with upstream kernel? > > > > Thanks, > > Mariusz > > > > . > > >
Hi, 在 2024/11/05 17:01, Xiao Ni 写道: > On Tue, Nov 5, 2024 at 4:29 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: >> >> Hi, >> >> 在 2024/11/05 16:16, Mariusz Tkaczyk 写道: >>> On Tue, 5 Nov 2024 15:57:33 +0800 >>> Xiao Ni <xni@redhat.com> wrote: >>> >>>> One customer reports a bug: raid5 is hung when changing thread cnt >>>> while resync is running. The stripes are all in conf->handle_list >>>> and new threads can't handle them. >>> >>> Is issue fixed with this patch? Is is missing here :) >>>> >>>> Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes >>>> pers->quiesce from mddev_suspend/resume, then we can't guarantee sync >>>> requests finish in suspend operation. One personality knows itself the >>>> best. So pers->quiesce is a proper way to let personality quiesce. >>> >>>> >>>> Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()") >>>> Signed-off-by: Xiao Ni <xni@redhat.com> >>>> --- >>>> drivers/md/md.c | 6 ++++++ >>>> 1 file changed, 6 insertions(+) >>>> >>>> diff --git a/drivers/md/md.c b/drivers/md/md.c >>>> index 67108c397c5a..7409ecb2df68 100644 >>>> --- a/drivers/md/md.c >>>> +++ b/drivers/md/md.c >>>> @@ -482,6 +482,9 @@ int mddev_suspend(struct mddev *mddev, bool interruptible) >>>> return err; >>>> } >>>> >>>> + if (mddev->pers) >>>> + mddev->pers->quiesce(mddev, 1); >>>> + >>> >>> Shouldn't it be implemented as below? According to b39f35ebe86d, some levels are >>> not implementing this? >>> >>> + if (mddev->pers && mddev->pers->quiesce) >>> + mddev->pers->quiesce(mddev, 1); >> >> It's fine, the fops is never NULL, just some levels points to an empty >> function. >> >> The tricky part here is that accessing "mddev->pers" is not safe here, >> 'reconfig_mutex' is not held in mddev_suspend(), and can concurrent >> with, for example, change levels. :( > > Hi Kuai > > Now mddev->suspended is protected by mddev->suspend_mutex. It should > can avoid the problem you mentioned above? level_store calls > mddev_suspend and mddev->suspended is set to mddev->suspended+1. So > other paths will return because mddev->suspended is not 0. Yeah, level store is just an wrong example, key point here is that mddev->pers is not protected by 'suspend_mutex'. Other places are md_run() and md_stop(), these path doesn't hold 'suspend_mutex' right? And they look like can concurrent with suspend, for example, suspend_lo_store() Did you consider just fail raid5_store_group_thread_cnt() if sync_thread is active? Or call ->quiesce() directly here with 'reconfig_mutex' held before suspend? Thanks, Kuai > > Regards > Xiao >> >> Thanks, >> Kuai >> >>> >>> Is it reproducible with upstream kernel? >>> >>> Thanks, >>> Mariusz >>> >>> . >>> >> > > > . >
On Tue, Nov 5, 2024 at 5:22 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > > Hi, > > 在 2024/11/05 17:01, Xiao Ni 写道: > > On Tue, Nov 5, 2024 at 4:29 PM Yu Kuai <yukuai1@huaweicloud.com> wrote: > >> > >> Hi, > >> > >> 在 2024/11/05 16:16, Mariusz Tkaczyk 写道: > >>> On Tue, 5 Nov 2024 15:57:33 +0800 > >>> Xiao Ni <xni@redhat.com> wrote: > >>> > >>>> One customer reports a bug: raid5 is hung when changing thread cnt > >>>> while resync is running. The stripes are all in conf->handle_list > >>>> and new threads can't handle them. > >>> > >>> Is issue fixed with this patch? Is is missing here :) > >>>> > >>>> Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes > >>>> pers->quiesce from mddev_suspend/resume, then we can't guarantee sync > >>>> requests finish in suspend operation. One personality knows itself the > >>>> best. So pers->quiesce is a proper way to let personality quiesce. > >>> > >>>> > >>>> Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()") > >>>> Signed-off-by: Xiao Ni <xni@redhat.com> > >>>> --- > >>>> drivers/md/md.c | 6 ++++++ > >>>> 1 file changed, 6 insertions(+) > >>>> > >>>> diff --git a/drivers/md/md.c b/drivers/md/md.c > >>>> index 67108c397c5a..7409ecb2df68 100644 > >>>> --- a/drivers/md/md.c > >>>> +++ b/drivers/md/md.c > >>>> @@ -482,6 +482,9 @@ int mddev_suspend(struct mddev *mddev, bool interruptible) > >>>> return err; > >>>> } > >>>> > >>>> + if (mddev->pers) > >>>> + mddev->pers->quiesce(mddev, 1); > >>>> + > >>> > >>> Shouldn't it be implemented as below? According to b39f35ebe86d, some levels are > >>> not implementing this? > >>> > >>> + if (mddev->pers && mddev->pers->quiesce) > >>> + mddev->pers->quiesce(mddev, 1); > >> > >> It's fine, the fops is never NULL, just some levels points to an empty > >> function. > >> > >> The tricky part here is that accessing "mddev->pers" is not safe here, > >> 'reconfig_mutex' is not held in mddev_suspend(), and can concurrent > >> with, for example, change levels. :( > > > > Hi Kuai > > > > Now mddev->suspended is protected by mddev->suspend_mutex. It should > > can avoid the problem you mentioned above? level_store calls > > mddev_suspend and mddev->suspended is set to mddev->suspended+1. So > > other paths will return because mddev->suspended is not 0. > > Yeah, level store is just an wrong example, key point here is that > mddev->pers is not protected by 'suspend_mutex'. Other places are > md_run() and md_stop(), these path doesn't hold 'suspend_mutex' right? > And they look like can concurrent with suspend, for example, > suspend_lo_store() Yes, thanks for pointing out this. I need to think more about this. > > Did you consider just fail raid5_store_group_thread_cnt() if sync_thread > is active? It's one method, but not a good one. > > Or call ->quiesce() directly here with 'reconfig_mutex' held before > suspend? I'll think this way. Thanks! Regards Xiao > > Thanks, > Kuai > > > > > Regards > > Xiao > >> > >> Thanks, > >> Kuai > >> > >>> > >>> Is it reproducible with upstream kernel? > >>> > >>> Thanks, > >>> Mariusz > >>> > >>> . > >>> > >> > > > > > > . > > >
diff --git a/drivers/md/md.c b/drivers/md/md.c index 67108c397c5a..7409ecb2df68 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -482,6 +482,9 @@ int mddev_suspend(struct mddev *mddev, bool interruptible) return err; } + if (mddev->pers) + mddev->pers->quiesce(mddev, 1); + /* * For raid456, io might be waiting for reshape to make progress, * allow new reshape to start while waiting for io to be done to @@ -514,6 +517,9 @@ static void __mddev_resume(struct mddev *mddev, bool recovery_needed) percpu_ref_resurrect(&mddev->active_io); wake_up(&mddev->sb_wait); + if (mddev->pers) + mddev->pers->quiesce(mddev, 0); + if (recovery_needed) set_bit(MD_RECOVERY_NEEDED, &mddev->recovery); md_wakeup_thread(mddev->thread);
One customer reports a bug: raid5 is hung when changing thread cnt while resync is running. The stripes are all in conf->handle_list and new threads can't handle them. Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes pers->quiesce from mddev_suspend/resume, then we can't guarantee sync requests finish in suspend operation. One personality knows itself the best. So pers->quiesce is a proper way to let personality quiesce. Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()") Signed-off-by: Xiao Ni <xni@redhat.com> --- drivers/md/md.c | 6 ++++++ 1 file changed, 6 insertions(+)