Message ID | 20240426072135.291206-2-joel.colledge@linbit.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Mike Snitzer |
Headers | show |
Series | dm-delay: fix hung task issue | expand |
On 26/04/2024 08:21, Joel Colledge wrote: > If the worker thread is not woken due to a bio, then it is not woken at > all. This causes the hung task check to trigger. For instance, when a > delay of 0 is configured, delay_bio() returns without waking the worker, > so this situation occurs. > > Prevent the hung task check from triggering by waking up the newly > minted worker in delay_ctr(). > > Fixes: 70bbeb29fab0 ("dm delay: for short delays, use kthread instead of timers and wq") > Signed-off-by: Joel Colledge <joel.colledge@linbit.com> > --- > drivers/md/dm-delay.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c > index 5eabdb06c649..003512bb5fea 100644 > --- a/drivers/md/dm-delay.c > +++ b/drivers/md/dm-delay.c > @@ -274,6 +274,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv) > dc->worker = NULL; > goto bad; > } > + wake_up_process(dc->worker); > } else { > timer_setup(&dc->delay_timer, handle_delayed_timer, 0); > INIT_WORK(&dc->flush_expired_bios, flush_expired_bios); Is this an issue for delay > 0 too somehow? Indeed if we don't have a delay the process will never be woken up, but in that case, why create the worker in the first place? You're missing lkml as recipient btw. Kind Regards, Christian
On Tue, 30 Apr 2024 at 16:28, Christian Loehle <christian.loehle@arm.com> wrote: > Is this an issue for delay > 0 too somehow? I believe it is. If there is simply no IO to the delay device, then nothing will wake the new thread and the same issue will occur. I haven't yet reproduced this case, because the system I am testing on submits reads on any new block device and I don't know offhand how to disable them. I think the reads are from scans for multipathd, bcachefs or similar. > Indeed if we don't have a delay the process will never be woken up, > but in that case, why create the worker in the first place? I agree. If this were only an issue with delay == 0, it would make more sense not to create the worker at all. As mentioned above, I believe the issue can occur with delay > 0. So not creating the worker in the delay == 0 case is an additional optimization and out of the scope of this change. > You're missing lkml as recipient btw. Thanks for the pointer and thanks for your response! Best regards, Joel
On 30/04/2024 15:44, Joel Colledge wrote: > On Tue, 30 Apr 2024 at 16:28, Christian Loehle <christian.loehle@arm.com> wrote: >> Is this an issue for delay > 0 too somehow? > > I believe it is. If there is simply no IO to the delay device, then > nothing will wake the new thread and the same issue will occur. Yes, but might be better to just set_current_state(TASK_INTERRUPTIBLE); instead of the wakeup. > > I haven't yet reproduced this case, because the system I am testing on > submits reads on any new block device and I don't know offhand how to > disable them. I think the reads are from scans for multipathd, > bcachefs or similar. > >> Indeed if we don't have a delay the process will never be woken up, >> but in that case, why create the worker in the first place? > > I agree. If this were only an issue with delay == 0, it would make > more sense not to create the worker at all. > > As mentioned above, I believe the issue can occur with delay > 0. So > not creating the worker in the delay == 0 case is an additional > optimization and out of the scope of this change. Ah right because of the delay_is_fast() definition that is slightly more work. Regards, Christian.
On Tue, 30 Apr 2024 at 17:27, Christian Loehle <christian.loehle@arm.com> wrote: > On 30/04/2024 15:44, Joel Colledge wrote: > > On Tue, 30 Apr 2024 at 16:28, Christian Loehle <christian.loehle@arm.com> wrote: > >> Is this an issue for delay > 0 too somehow? > > > > I believe it is. If there is simply no IO to the delay device, then > > nothing will wake the new thread and the same issue will occur. > > Yes, but might be better to just set_current_state(TASK_INTERRUPTIBLE); > instead of the wakeup. I'm afraid I don't follow what you mean. set_current_state(TASK_INTERRUPTIBLE) would need to be called from the worker, but the worker isn't running yet, so it can't do that. When preparing this patch, I checked how common kthread_create() followed by wake_up_process() is. It is fairly common according to this very approximate metric: $ grep -r -I -A10 kthread_create | grep -c wake_up_process 49 I just looked through those matches in more detail and noticed that there is a kthread_run() macro which does exactly what we need. I will change my suggestion to use that instead. I will wait for more comments before sending a new version of the patch. Best regards, Joel
On 30/04/2024 17:12, Joel Colledge wrote: > On Tue, 30 Apr 2024 at 17:27, Christian Loehle <christian.loehle@arm.com> wrote: >> On 30/04/2024 15:44, Joel Colledge wrote: >>> On Tue, 30 Apr 2024 at 16:28, Christian Loehle <christian.loehle@arm.com> wrote: >>>> Is this an issue for delay > 0 too somehow? >>> >>> I believe it is. If there is simply no IO to the delay device, then >>> nothing will wake the new thread and the same issue will occur. >> >> Yes, but might be better to just set_current_state(TASK_INTERRUPTIBLE); >> instead of the wakeup. > > I'm afraid I don't follow what you mean. > set_current_state(TASK_INTERRUPTIBLE) would need to be called from the > worker, but the worker isn't running yet, so it can't do that. Sorry about that, wasn't thinking :/ > > When preparing this patch, I checked how common kthread_create() > followed by wake_up_process() is. It is fairly common according to > this very approximate metric: > $ grep -r -I -A10 kthread_create | grep -c wake_up_process > 49 > > I just looked through those matches in more detail and noticed that > there is a kthread_run() macro which does exactly what we need. I will > change my suggestion to use that instead. > > I will wait for more comments before sending a new version of the patch. > > Best regards, > Joel
On Tue, Apr 30, 2024 at 06:12:37PM +0200, Joel Colledge wrote: > On Tue, 30 Apr 2024 at 17:27, Christian Loehle <christian.loehle@arm.com> wrote: > > On 30/04/2024 15:44, Joel Colledge wrote: > > > On Tue, 30 Apr 2024 at 16:28, Christian Loehle <christian.loehle@arm.com> wrote: > > >> Is this an issue for delay > 0 too somehow? > > > > > > I believe it is. If there is simply no IO to the delay device, then > > > nothing will wake the new thread and the same issue will occur. > > > > Yes, but might be better to just set_current_state(TASK_INTERRUPTIBLE); > > instead of the wakeup. > > I'm afraid I don't follow what you mean. > set_current_state(TASK_INTERRUPTIBLE) would need to be called from the > worker, but the worker isn't running yet, so it can't do that. > > When preparing this patch, I checked how common kthread_create() > followed by wake_up_process() is. It is fairly common according to > this very approximate metric: > $ grep -r -I -A10 kthread_create | grep -c wake_up_process > 49 > > I just looked through those matches in more detail and noticed that > there is a kthread_run() macro which does exactly what we need. I will > change my suggestion to use that instead. > > I will wait for more comments before sending a new version of the patch. kthread_run() looks like the right solution here. There's no need to wait for more initialization and the kthread will put itself to sleep momentarily. Please send an updated patch. -Ben > > Best regards, > Joel
diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c index 5eabdb06c649..003512bb5fea 100644 --- a/drivers/md/dm-delay.c +++ b/drivers/md/dm-delay.c @@ -274,6 +274,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv) dc->worker = NULL; goto bad; } + wake_up_process(dc->worker); } else { timer_setup(&dc->delay_timer, handle_delayed_timer, 0); INIT_WORK(&dc->flush_expired_bios, flush_expired_bios);
If the worker thread is not woken due to a bio, then it is not woken at all. This causes the hung task check to trigger. For instance, when a delay of 0 is configured, delay_bio() returns without waking the worker, so this situation occurs. Prevent the hung task check from triggering by waking up the newly minted worker in delay_ctr(). Fixes: 70bbeb29fab0 ("dm delay: for short delays, use kthread instead of timers and wq") Signed-off-by: Joel Colledge <joel.colledge@linbit.com> --- drivers/md/dm-delay.c | 1 + 1 file changed, 1 insertion(+)