diff mbox series

[1/1] dm-delay: fix hung task introduced by kthread mode

Message ID 20240426072135.291206-2-joel.colledge@linbit.com (mailing list archive)
State Superseded, archived
Delegated to: Mike Snitzer
Headers show
Series dm-delay: fix hung task issue | expand

Commit Message

Joel Colledge April 26, 2024, 7:21 a.m. UTC
If the worker thread is not woken due to a bio, then it is not woken at
all. This causes the hung task check to trigger. For instance, when a
delay of 0 is configured, delay_bio() returns without waking the worker,
so this situation occurs.

Prevent the hung task check from triggering by waking up the newly
minted worker in delay_ctr().

Fixes: 70bbeb29fab0 ("dm delay: for short delays, use kthread instead of timers and wq")
Signed-off-by: Joel Colledge <joel.colledge@linbit.com>
---
 drivers/md/dm-delay.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Christian Loehle April 30, 2024, 2:28 p.m. UTC | #1
On 26/04/2024 08:21, Joel Colledge wrote:
> If the worker thread is not woken due to a bio, then it is not woken at
> all. This causes the hung task check to trigger. For instance, when a
> delay of 0 is configured, delay_bio() returns without waking the worker,
> so this situation occurs.
> 
> Prevent the hung task check from triggering by waking up the newly
> minted worker in delay_ctr().
> 
> Fixes: 70bbeb29fab0 ("dm delay: for short delays, use kthread instead of timers and wq")
> Signed-off-by: Joel Colledge <joel.colledge@linbit.com>
> ---
>  drivers/md/dm-delay.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c
> index 5eabdb06c649..003512bb5fea 100644
> --- a/drivers/md/dm-delay.c
> +++ b/drivers/md/dm-delay.c
> @@ -274,6 +274,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  			dc->worker = NULL;
>  			goto bad;
>  		}
> +		wake_up_process(dc->worker);
>  	} else {
>  		timer_setup(&dc->delay_timer, handle_delayed_timer, 0);
>  		INIT_WORK(&dc->flush_expired_bios, flush_expired_bios);

Is this an issue for delay > 0 too somehow?
Indeed if we don't have a delay the process will never be woken up,
but in that case, why create the worker in the first place?

You're missing lkml as recipient btw.

Kind Regards,
Christian
Joel Colledge April 30, 2024, 2:44 p.m. UTC | #2
On Tue, 30 Apr 2024 at 16:28, Christian Loehle <christian.loehle@arm.com> wrote:
> Is this an issue for delay > 0 too somehow?

I believe it is. If there is simply no IO to the delay device, then
nothing will wake the new thread and the same issue will occur.

I haven't yet reproduced this case, because the system I am testing on
submits reads on any new block device and I don't know offhand how to
disable them. I think the reads are from scans for multipathd,
bcachefs or similar.

> Indeed if we don't have a delay the process will never be woken up,
> but in that case, why create the worker in the first place?

I agree. If this were only an issue with delay == 0, it would make
more sense not to create the worker at all.

As mentioned above, I believe the issue can occur with delay > 0. So
not creating the worker in the delay == 0 case is an additional
optimization and out of the scope of this change.

> You're missing lkml as recipient btw.

Thanks for the pointer and thanks for your response!

Best regards,
Joel
Christian Loehle April 30, 2024, 3:26 p.m. UTC | #3
On 30/04/2024 15:44, Joel Colledge wrote:
> On Tue, 30 Apr 2024 at 16:28, Christian Loehle <christian.loehle@arm.com> wrote:
>> Is this an issue for delay > 0 too somehow?
> 
> I believe it is. If there is simply no IO to the delay device, then
> nothing will wake the new thread and the same issue will occur.

Yes, but might be better to just set_current_state(TASK_INTERRUPTIBLE);
instead of the wakeup.

> 
> I haven't yet reproduced this case, because the system I am testing on
> submits reads on any new block device and I don't know offhand how to
> disable them. I think the reads are from scans for multipathd,
> bcachefs or similar.
> 
>> Indeed if we don't have a delay the process will never be woken up,
>> but in that case, why create the worker in the first place?
> 
> I agree. If this were only an issue with delay == 0, it would make
> more sense not to create the worker at all.
> 
> As mentioned above, I believe the issue can occur with delay > 0. So
> not creating the worker in the delay == 0 case is an additional
> optimization and out of the scope of this change.

Ah right because of the delay_is_fast() definition that is slightly more
work.

Regards,
Christian.
Joel Colledge April 30, 2024, 4:12 p.m. UTC | #4
On Tue, 30 Apr 2024 at 17:27, Christian Loehle <christian.loehle@arm.com> wrote:
> On 30/04/2024 15:44, Joel Colledge wrote:
> > On Tue, 30 Apr 2024 at 16:28, Christian Loehle <christian.loehle@arm.com> wrote:
> >> Is this an issue for delay > 0 too somehow?
> >
> > I believe it is. If there is simply no IO to the delay device, then
> > nothing will wake the new thread and the same issue will occur.
>
> Yes, but might be better to just set_current_state(TASK_INTERRUPTIBLE);
> instead of the wakeup.

I'm afraid I don't follow what you mean.
set_current_state(TASK_INTERRUPTIBLE) would need to be called from the
worker, but the worker isn't running yet, so it can't do that.

When preparing this patch, I checked how common kthread_create()
followed by wake_up_process() is. It is fairly common according to
this very approximate metric:
$ grep -r -I -A10 kthread_create | grep -c wake_up_process
49

I just looked through those matches in more detail and noticed that
there is a kthread_run() macro which does exactly what we need. I will
change my suggestion to use that instead.

I will wait for more comments before sending a new version of the patch.

Best regards,
Joel
Christian Loehle May 1, 2024, 8:26 a.m. UTC | #5
On 30/04/2024 17:12, Joel Colledge wrote:
> On Tue, 30 Apr 2024 at 17:27, Christian Loehle <christian.loehle@arm.com> wrote:
>> On 30/04/2024 15:44, Joel Colledge wrote:
>>> On Tue, 30 Apr 2024 at 16:28, Christian Loehle <christian.loehle@arm.com> wrote:
>>>> Is this an issue for delay > 0 too somehow?
>>>
>>> I believe it is. If there is simply no IO to the delay device, then
>>> nothing will wake the new thread and the same issue will occur.
>>
>> Yes, but might be better to just set_current_state(TASK_INTERRUPTIBLE);
>> instead of the wakeup.
> 
> I'm afraid I don't follow what you mean.
> set_current_state(TASK_INTERRUPTIBLE) would need to be called from the
> worker, but the worker isn't running yet, so it can't do that.

Sorry about that, wasn't thinking :/

> 
> When preparing this patch, I checked how common kthread_create()
> followed by wake_up_process() is. It is fairly common according to
> this very approximate metric:
> $ grep -r -I -A10 kthread_create | grep -c wake_up_process
> 49
> 
> I just looked through those matches in more detail and noticed that
> there is a kthread_run() macro which does exactly what we need. I will
> change my suggestion to use that instead.
> 
> I will wait for more comments before sending a new version of the patch.
> 
> Best regards,
> Joel
Benjamin Marzinski May 3, 2024, 10:56 p.m. UTC | #6
On Tue, Apr 30, 2024 at 06:12:37PM +0200, Joel Colledge wrote:
> On Tue, 30 Apr 2024 at 17:27, Christian Loehle <christian.loehle@arm.com> wrote:
> > On 30/04/2024 15:44, Joel Colledge wrote:
> > > On Tue, 30 Apr 2024 at 16:28, Christian Loehle <christian.loehle@arm.com> wrote:
> > >> Is this an issue for delay > 0 too somehow?
> > >
> > > I believe it is. If there is simply no IO to the delay device, then
> > > nothing will wake the new thread and the same issue will occur.
> >
> > Yes, but might be better to just set_current_state(TASK_INTERRUPTIBLE);
> > instead of the wakeup.
> 
> I'm afraid I don't follow what you mean.
> set_current_state(TASK_INTERRUPTIBLE) would need to be called from the
> worker, but the worker isn't running yet, so it can't do that.
> 
> When preparing this patch, I checked how common kthread_create()
> followed by wake_up_process() is. It is fairly common according to
> this very approximate metric:
> $ grep -r -I -A10 kthread_create | grep -c wake_up_process
> 49
> 
> I just looked through those matches in more detail and noticed that
> there is a kthread_run() macro which does exactly what we need. I will
> change my suggestion to use that instead.
> 
> I will wait for more comments before sending a new version of the patch.

kthread_run() looks like the right solution here. There's no need to
wait for more initialization and the kthread will put itself to sleep
momentarily.  Please send an updated patch.

-Ben

> 
> Best regards,
> Joel
diff mbox series

Patch

diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c
index 5eabdb06c649..003512bb5fea 100644
--- a/drivers/md/dm-delay.c
+++ b/drivers/md/dm-delay.c
@@ -274,6 +274,7 @@  static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 			dc->worker = NULL;
 			goto bad;
 		}
+		wake_up_process(dc->worker);
 	} else {
 		timer_setup(&dc->delay_timer, handle_delayed_timer, 0);
 		INIT_WORK(&dc->flush_expired_bios, flush_expired_bios);