diff mbox series

[RFC,1/1] md: Use pers->quiesce in mddev_suspend

Message ID 20241105075733.66101-1-xni@redhat.com (mailing list archive)
State RFC
Headers show
Series [RFC,1/1] md: Use pers->quiesce in mddev_suspend | expand

Commit Message

Xiao Ni Nov. 5, 2024, 7:57 a.m. UTC
One customer reports a bug: raid5 is hung when changing thread cnt
while resync is running. The stripes are all in conf->handle_list
and new threads can't handle them.

Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes
pers->quiesce from mddev_suspend/resume, then we can't guarantee sync
requests finish in suspend operation. One personality knows itself the
best. So pers->quiesce is a proper way to let personality quiesce.

Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()")
Signed-off-by: Xiao Ni <xni@redhat.com>
---
 drivers/md/md.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Yu Kuai Nov. 5, 2024, 8:14 a.m. UTC | #1
Hi,

在 2024/11/05 15:57, Xiao Ni 写道:
> One customer reports a bug: raid5 is hung when changing thread cnt
> while resync is running. The stripes are all in conf->handle_list
> and new threads can't handle them.
> 
> Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes
> pers->quiesce from mddev_suspend/resume, then we can't guarantee sync
> requests finish in suspend operation. One personality knows itself the
> best. So pers->quiesce is a proper way to let personality quiesce.

Do you mean that other than normal IO, raid5 expects sync IO to be done
as well by mddev_suspend()? If so, we'd better add some comments as
well.

Thanks,
Kuai

> 
> Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()")
> Signed-off-by: Xiao Ni <xni@redhat.com>
> ---
>   drivers/md/md.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 67108c397c5a..7409ecb2df68 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -482,6 +482,9 @@ int mddev_suspend(struct mddev *mddev, bool interruptible)
>   		return err;
>   	}
>   
> +	if (mddev->pers)
> +		mddev->pers->quiesce(mddev, 1);
> +
>   	/*
>   	 * For raid456, io might be waiting for reshape to make progress,
>   	 * allow new reshape to start while waiting for io to be done to
> @@ -514,6 +517,9 @@ static void __mddev_resume(struct mddev *mddev, bool recovery_needed)
>   	percpu_ref_resurrect(&mddev->active_io);
>   	wake_up(&mddev->sb_wait);
>   
> +	if (mddev->pers)
> +		mddev->pers->quiesce(mddev, 0);
> +
>   	if (recovery_needed)
>   		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
>   	md_wakeup_thread(mddev->thread);
>
Mariusz Tkaczyk Nov. 5, 2024, 8:16 a.m. UTC | #2
On Tue,  5 Nov 2024 15:57:33 +0800
Xiao Ni <xni@redhat.com> wrote:

> One customer reports a bug: raid5 is hung when changing thread cnt
> while resync is running. The stripes are all in conf->handle_list
> and new threads can't handle them.

Is issue fixed with this patch? Is is missing here :)
> 
> Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes
> pers->quiesce from mddev_suspend/resume, then we can't guarantee sync
> requests finish in suspend operation. One personality knows itself the
> best. So pers->quiesce is a proper way to let personality quiesce.

> 
> Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()")
> Signed-off-by: Xiao Ni <xni@redhat.com>
> ---
>  drivers/md/md.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 67108c397c5a..7409ecb2df68 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -482,6 +482,9 @@ int mddev_suspend(struct mddev *mddev, bool interruptible)
>  		return err;
>  	}
>  
> +	if (mddev->pers)
> +		mddev->pers->quiesce(mddev, 1);
> +

Shouldn't it be implemented as below? According to b39f35ebe86d, some levels are
not implementing this?

+	if (mddev->pers && mddev->pers->quiesce)
+		mddev->pers->quiesce(mddev, 1);

Is it reproducible with upstream kernel?

Thanks,
Mariusz
Yu Kuai Nov. 5, 2024, 8:29 a.m. UTC | #3
Hi,

在 2024/11/05 16:16, Mariusz Tkaczyk 写道:
> On Tue,  5 Nov 2024 15:57:33 +0800
> Xiao Ni <xni@redhat.com> wrote:
> 
>> One customer reports a bug: raid5 is hung when changing thread cnt
>> while resync is running. The stripes are all in conf->handle_list
>> and new threads can't handle them.
> 
> Is issue fixed with this patch? Is is missing here :)
>>
>> Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes
>> pers->quiesce from mddev_suspend/resume, then we can't guarantee sync
>> requests finish in suspend operation. One personality knows itself the
>> best. So pers->quiesce is a proper way to let personality quiesce.
> 
>>
>> Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()")
>> Signed-off-by: Xiao Ni <xni@redhat.com>
>> ---
>>   drivers/md/md.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 67108c397c5a..7409ecb2df68 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -482,6 +482,9 @@ int mddev_suspend(struct mddev *mddev, bool interruptible)
>>   		return err;
>>   	}
>>   
>> +	if (mddev->pers)
>> +		mddev->pers->quiesce(mddev, 1);
>> +
> 
> Shouldn't it be implemented as below? According to b39f35ebe86d, some levels are
> not implementing this?
> 
> +	if (mddev->pers && mddev->pers->quiesce)
> +		mddev->pers->quiesce(mddev, 1);

It's fine, the fops is never NULL, just some levels points to an empty
function.

The tricky part here is that accessing "mddev->pers" is not safe here,
'reconfig_mutex' is not held in mddev_suspend(), and can concurrent
with, for example, change levels. :(

Thanks,
Kuai

> 
> Is it reproducible with upstream kernel?
> 
> Thanks,
> Mariusz
> 
> .
>
Xiao Ni Nov. 5, 2024, 8:48 a.m. UTC | #4
On Tue, Nov 5, 2024 at 4:14 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/11/05 15:57, Xiao Ni 写道:
> > One customer reports a bug: raid5 is hung when changing thread cnt
> > while resync is running. The stripes are all in conf->handle_list
> > and new threads can't handle them.
> >
> > Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes
> > pers->quiesce from mddev_suspend/resume, then we can't guarantee sync
> > requests finish in suspend operation. One personality knows itself the
> > best. So pers->quiesce is a proper way to let personality quiesce.
>
> Do you mean that other than normal IO, raid5 expects sync IO to be done
> as well by mddev_suspend()? If so, we'd better add some comments as
> well.

Hi Kuai

Yes, before patch b39f35ebe86d, mddev suspend can guarantee all io
(normal io from filesystem and sync requests) finish. And raid5
conf->handle_list have normal io and sync io, so it should wait all
these io finish in suspend operation. I'll add more comments.

Regards
Xiao
>
> Thanks,
> Kuai
>
> >
> > Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()")
> > Signed-off-by: Xiao Ni <xni@redhat.com>
> > ---
> >   drivers/md/md.c | 6 ++++++
> >   1 file changed, 6 insertions(+)
> >
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 67108c397c5a..7409ecb2df68 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -482,6 +482,9 @@ int mddev_suspend(struct mddev *mddev, bool interruptible)
> >               return err;
> >       }
> >
> > +     if (mddev->pers)
> > +             mddev->pers->quiesce(mddev, 1);
> > +
> >       /*
> >        * For raid456, io might be waiting for reshape to make progress,
> >        * allow new reshape to start while waiting for io to be done to
> > @@ -514,6 +517,9 @@ static void __mddev_resume(struct mddev *mddev, bool recovery_needed)
> >       percpu_ref_resurrect(&mddev->active_io);
> >       wake_up(&mddev->sb_wait);
> >
> > +     if (mddev->pers)
> > +             mddev->pers->quiesce(mddev, 0);
> > +
> >       if (recovery_needed)
> >               set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> >       md_wakeup_thread(mddev->thread);
> >
>
Xiao Ni Nov. 5, 2024, 9:01 a.m. UTC | #5
On Tue, Nov 5, 2024 at 4:29 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/11/05 16:16, Mariusz Tkaczyk 写道:
> > On Tue,  5 Nov 2024 15:57:33 +0800
> > Xiao Ni <xni@redhat.com> wrote:
> >
> >> One customer reports a bug: raid5 is hung when changing thread cnt
> >> while resync is running. The stripes are all in conf->handle_list
> >> and new threads can't handle them.
> >
> > Is issue fixed with this patch? Is is missing here :)
> >>
> >> Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes
> >> pers->quiesce from mddev_suspend/resume, then we can't guarantee sync
> >> requests finish in suspend operation. One personality knows itself the
> >> best. So pers->quiesce is a proper way to let personality quiesce.
> >
> >>
> >> Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()")
> >> Signed-off-by: Xiao Ni <xni@redhat.com>
> >> ---
> >>   drivers/md/md.c | 6 ++++++
> >>   1 file changed, 6 insertions(+)
> >>
> >> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >> index 67108c397c5a..7409ecb2df68 100644
> >> --- a/drivers/md/md.c
> >> +++ b/drivers/md/md.c
> >> @@ -482,6 +482,9 @@ int mddev_suspend(struct mddev *mddev, bool interruptible)
> >>              return err;
> >>      }
> >>
> >> +    if (mddev->pers)
> >> +            mddev->pers->quiesce(mddev, 1);
> >> +
> >
> > Shouldn't it be implemented as below? According to b39f35ebe86d, some levels are
> > not implementing this?
> >
> > +     if (mddev->pers && mddev->pers->quiesce)
> > +             mddev->pers->quiesce(mddev, 1);
>
> It's fine, the fops is never NULL, just some levels points to an empty
> function.
>
> The tricky part here is that accessing "mddev->pers" is not safe here,
> 'reconfig_mutex' is not held in mddev_suspend(), and can concurrent
> with, for example, change levels. :(

Hi Kuai

Now mddev->suspended is protected by mddev->suspend_mutex. It should
can avoid the problem you mentioned above? level_store calls
mddev_suspend and mddev->suspended is set to mddev->suspended+1. So
other paths will return because mddev->suspended is not 0.

Regards
Xiao
>
> Thanks,
> Kuai
>
> >
> > Is it reproducible with upstream kernel?
> >
> > Thanks,
> > Mariusz
> >
> > .
> >
>
Yu Kuai Nov. 5, 2024, 9:22 a.m. UTC | #6
Hi,

在 2024/11/05 17:01, Xiao Ni 写道:
> On Tue, Nov 5, 2024 at 4:29 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2024/11/05 16:16, Mariusz Tkaczyk 写道:
>>> On Tue,  5 Nov 2024 15:57:33 +0800
>>> Xiao Ni <xni@redhat.com> wrote:
>>>
>>>> One customer reports a bug: raid5 is hung when changing thread cnt
>>>> while resync is running. The stripes are all in conf->handle_list
>>>> and new threads can't handle them.
>>>
>>> Is issue fixed with this patch? Is is missing here :)
>>>>
>>>> Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes
>>>> pers->quiesce from mddev_suspend/resume, then we can't guarantee sync
>>>> requests finish in suspend operation. One personality knows itself the
>>>> best. So pers->quiesce is a proper way to let personality quiesce.
>>>
>>>>
>>>> Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()")
>>>> Signed-off-by: Xiao Ni <xni@redhat.com>
>>>> ---
>>>>    drivers/md/md.c | 6 ++++++
>>>>    1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>>>> index 67108c397c5a..7409ecb2df68 100644
>>>> --- a/drivers/md/md.c
>>>> +++ b/drivers/md/md.c
>>>> @@ -482,6 +482,9 @@ int mddev_suspend(struct mddev *mddev, bool interruptible)
>>>>               return err;
>>>>       }
>>>>
>>>> +    if (mddev->pers)
>>>> +            mddev->pers->quiesce(mddev, 1);
>>>> +
>>>
>>> Shouldn't it be implemented as below? According to b39f35ebe86d, some levels are
>>> not implementing this?
>>>
>>> +     if (mddev->pers && mddev->pers->quiesce)
>>> +             mddev->pers->quiesce(mddev, 1);
>>
>> It's fine, the fops is never NULL, just some levels points to an empty
>> function.
>>
>> The tricky part here is that accessing "mddev->pers" is not safe here,
>> 'reconfig_mutex' is not held in mddev_suspend(), and can concurrent
>> with, for example, change levels. :(
> 
> Hi Kuai
> 
> Now mddev->suspended is protected by mddev->suspend_mutex. It should
> can avoid the problem you mentioned above? level_store calls
> mddev_suspend and mddev->suspended is set to mddev->suspended+1. So
> other paths will return because mddev->suspended is not 0.

Yeah, level store is just an wrong example, key point here is that
mddev->pers is not protected by 'suspend_mutex'. Other places are
md_run() and md_stop(), these path doesn't hold 'suspend_mutex' right?
And they look like can concurrent with suspend, for example,
suspend_lo_store()

Did you consider just fail raid5_store_group_thread_cnt() if sync_thread
is active?

Or call ->quiesce() directly here with 'reconfig_mutex' held before
suspend?

Thanks,
Kuai

> 
> Regards
> Xiao
>>
>> Thanks,
>> Kuai
>>
>>>
>>> Is it reproducible with upstream kernel?
>>>
>>> Thanks,
>>> Mariusz
>>>
>>> .
>>>
>>
> 
> 
> .
>
Xiao Ni Nov. 5, 2024, 10:35 a.m. UTC | #7
On Tue, Nov 5, 2024 at 5:22 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2024/11/05 17:01, Xiao Ni 写道:
> > On Tue, Nov 5, 2024 at 4:29 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> Hi,
> >>
> >> 在 2024/11/05 16:16, Mariusz Tkaczyk 写道:
> >>> On Tue,  5 Nov 2024 15:57:33 +0800
> >>> Xiao Ni <xni@redhat.com> wrote:
> >>>
> >>>> One customer reports a bug: raid5 is hung when changing thread cnt
> >>>> while resync is running. The stripes are all in conf->handle_list
> >>>> and new threads can't handle them.
> >>>
> >>> Is issue fixed with this patch? Is is missing here :)
> >>>>
> >>>> Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes
> >>>> pers->quiesce from mddev_suspend/resume, then we can't guarantee sync
> >>>> requests finish in suspend operation. One personality knows itself the
> >>>> best. So pers->quiesce is a proper way to let personality quiesce.
> >>>
> >>>>
> >>>> Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()")
> >>>> Signed-off-by: Xiao Ni <xni@redhat.com>
> >>>> ---
> >>>>    drivers/md/md.c | 6 ++++++
> >>>>    1 file changed, 6 insertions(+)
> >>>>
> >>>> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >>>> index 67108c397c5a..7409ecb2df68 100644
> >>>> --- a/drivers/md/md.c
> >>>> +++ b/drivers/md/md.c
> >>>> @@ -482,6 +482,9 @@ int mddev_suspend(struct mddev *mddev, bool interruptible)
> >>>>               return err;
> >>>>       }
> >>>>
> >>>> +    if (mddev->pers)
> >>>> +            mddev->pers->quiesce(mddev, 1);
> >>>> +
> >>>
> >>> Shouldn't it be implemented as below? According to b39f35ebe86d, some levels are
> >>> not implementing this?
> >>>
> >>> +     if (mddev->pers && mddev->pers->quiesce)
> >>> +             mddev->pers->quiesce(mddev, 1);
> >>
> >> It's fine, the fops is never NULL, just some levels points to an empty
> >> function.
> >>
> >> The tricky part here is that accessing "mddev->pers" is not safe here,
> >> 'reconfig_mutex' is not held in mddev_suspend(), and can concurrent
> >> with, for example, change levels. :(
> >
> > Hi Kuai
> >
> > Now mddev->suspended is protected by mddev->suspend_mutex. It should
> > can avoid the problem you mentioned above? level_store calls
> > mddev_suspend and mddev->suspended is set to mddev->suspended+1. So
> > other paths will return because mddev->suspended is not 0.
>
> Yeah, level store is just an wrong example, key point here is that
> mddev->pers is not protected by 'suspend_mutex'. Other places are
> md_run() and md_stop(), these path doesn't hold 'suspend_mutex' right?
> And they look like can concurrent with suspend, for example,
> suspend_lo_store()

Yes, thanks for pointing out this. I need to think more about this.

>
> Did you consider just fail raid5_store_group_thread_cnt() if sync_thread
> is active?

It's one method, but not a good one.

>
> Or call ->quiesce() directly here with 'reconfig_mutex' held before
> suspend?

I'll think this way. Thanks!

Regards
Xiao
>
> Thanks,
> Kuai
>
> >
> > Regards
> > Xiao
> >>
> >> Thanks,
> >> Kuai
> >>
> >>>
> >>> Is it reproducible with upstream kernel?
> >>>
> >>> Thanks,
> >>> Mariusz
> >>>
> >>> .
> >>>
> >>
> >
> >
> > .
> >
>
diff mbox series

Patch

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 67108c397c5a..7409ecb2df68 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -482,6 +482,9 @@  int mddev_suspend(struct mddev *mddev, bool interruptible)
 		return err;
 	}
 
+	if (mddev->pers)
+		mddev->pers->quiesce(mddev, 1);
+
 	/*
 	 * For raid456, io might be waiting for reshape to make progress,
 	 * allow new reshape to start while waiting for io to be done to
@@ -514,6 +517,9 @@  static void __mddev_resume(struct mddev *mddev, bool recovery_needed)
 	percpu_ref_resurrect(&mddev->active_io);
 	wake_up(&mddev->sb_wait);
 
+	if (mddev->pers)
+		mddev->pers->quiesce(mddev, 0);
+
 	if (recovery_needed)
 		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 	md_wakeup_thread(mddev->thread);