Message ID | 31b118f3-bc8d-b18b-c4b9-e57d74a73f@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] completion: move blk_wait_io to kernel/sched/completion.c | expand |
On Wed, Apr 17, 2024 at 07:49:17PM +0200, Mikulas Patocka wrote: > Index: linux-2.6/kernel/sched/completion.c > =================================================================== > --- linux-2.6.orig/kernel/sched/completion.c 2024-04-17 19:41:14.000000000 +0200 > +++ linux-2.6/kernel/sched/completion.c 2024-04-17 19:41:14.000000000 +0200 > @@ -290,6 +290,26 @@ wait_for_completion_killable_timeout(str > EXPORT_SYMBOL(wait_for_completion_killable_timeout); > > /** > + * wait_for_completion_long_io - waits for completion of a task > + * @x: holds the state of this particular completion > + * > + * This is like wait_for_completion_io, but it doesn't warn if the wait takes > + * too long. > + */ > +void wait_for_completion_long_io(struct completion *x) > +{ > + /* Prevent hang_check timer from firing at us during very long I/O */ > + unsigned long timeout = sysctl_hung_task_timeout_secs * HZ / 2; > + > + if (timeout) > + while (!wait_for_completion_io_timeout(x, timeout)) > + ; > + else > + wait_for_completion_io(x); > +} > +EXPORT_SYMBOL(wait_for_completion_long_io); Urgh, why is it a sane thing to circumvent the hang check timer?
On Wed, 17 Apr 2024, Peter Zijlstra wrote: > On Wed, Apr 17, 2024 at 07:49:17PM +0200, Mikulas Patocka wrote: > > Index: linux-2.6/kernel/sched/completion.c > > =================================================================== > > --- linux-2.6.orig/kernel/sched/completion.c 2024-04-17 19:41:14.000000000 +0200 > > +++ linux-2.6/kernel/sched/completion.c 2024-04-17 19:41:14.000000000 +0200 > > @@ -290,6 +290,26 @@ wait_for_completion_killable_timeout(str > > EXPORT_SYMBOL(wait_for_completion_killable_timeout); > > > > /** > > + * wait_for_completion_long_io - waits for completion of a task > > + * @x: holds the state of this particular completion > > + * > > + * This is like wait_for_completion_io, but it doesn't warn if the wait takes > > + * too long. > > + */ > > +void wait_for_completion_long_io(struct completion *x) > > +{ > > + /* Prevent hang_check timer from firing at us during very long I/O */ > > + unsigned long timeout = sysctl_hung_task_timeout_secs * HZ / 2; > > + > > + if (timeout) > > + while (!wait_for_completion_io_timeout(x, timeout)) > > + ; > > + else > > + wait_for_completion_io(x); > > +} > > +EXPORT_SYMBOL(wait_for_completion_long_io); > > Urgh, why is it a sane thing to circumvent the hang check timer? The block layer already does it - the bios can have arbitrary size, so waiting for them takes arbitrary time. Mikulas
On Wed, Apr 17, 2024 at 08:00:22PM +0200, Mikulas Patocka wrote: > > > +EXPORT_SYMBOL(wait_for_completion_long_io); > > > > Urgh, why is it a sane thing to circumvent the hang check timer? > > The block layer already does it - the bios can have arbitrary size, so > waiting for them takes arbitrary time. And as mentioned the last few times around, I think we want a task state to say that task can sleep long or even forever and not propagate this hack even further.
On 4/17/24 10:57 PM, Christoph Hellwig wrote: > On Wed, Apr 17, 2024 at 08:00:22PM +0200, Mikulas Patocka wrote: >>>> +EXPORT_SYMBOL(wait_for_completion_long_io); >>> >>> Urgh, why is it a sane thing to circumvent the hang check timer? >> >> The block layer already does it - the bios can have arbitrary size, so >> waiting for them takes arbitrary time. > > And as mentioned the last few times around, I think we want a task > state to say that task can sleep long or even forever and not propagate > this hack even further. It certainly is a hack/work-around, but unless there are a lot more that should be using something like this, I don't think adding extra core complexity in terms of a special task state (or per-task flag, at least that would be easier) is really warranted.
On Thu, Apr 18, 2024 at 08:30:14AM -0600, Jens Axboe wrote: > It certainly is a hack/work-around, but unless there are a lot more that > should be using something like this, I don't think adding extra core > complexity in terms of a special task state (or per-task flag, at least > that would be easier) is really warranted. Basically any kernel thread doing on-demand work has the same problem. It just has an easier workaround hack, as the kernel threads can simply claim to do an interruptible sleep to not trigger the softlockup warnings.
On 4/18/24 8:46 AM, Christoph Hellwig wrote: > On Thu, Apr 18, 2024 at 08:30:14AM -0600, Jens Axboe wrote: >> It certainly is a hack/work-around, but unless there are a lot more that >> should be using something like this, I don't think adding extra core >> complexity in terms of a special task state (or per-task flag, at least >> that would be easier) is really warranted. > > Basically any kernel thread doing on-demand work has the same problem. > It just has an easier workaround hack, as the kernel threads can simply > claim to do an interruptible sleep to not trigger the softlockup > warnings. A kernel thread can just use TASK_INTERRUPTIBLE, as it doesn't take signals anyway. But yeah, I guess you could view that as a work-around as well. Outside of that, mostly only a block problem, where our sleep is always uninterruptible. Unless there are similar hacks elsewhere in the kernel that I'm not aware of?
On Wed, Apr 17, 2024 at 08:00:22PM +0200, Mikulas Patocka wrote: > > > On Wed, 17 Apr 2024, Peter Zijlstra wrote: > > > On Wed, Apr 17, 2024 at 07:49:17PM +0200, Mikulas Patocka wrote: > > > Index: linux-2.6/kernel/sched/completion.c > > > =================================================================== > > > --- linux-2.6.orig/kernel/sched/completion.c 2024-04-17 19:41:14.000000000 +0200 > > > +++ linux-2.6/kernel/sched/completion.c 2024-04-17 19:41:14.000000000 +0200 > > > @@ -290,6 +290,26 @@ wait_for_completion_killable_timeout(str > > > EXPORT_SYMBOL(wait_for_completion_killable_timeout); > > > > > > /** > > > + * wait_for_completion_long_io - waits for completion of a task > > > + * @x: holds the state of this particular completion > > > + * > > > + * This is like wait_for_completion_io, but it doesn't warn if the wait takes > > > + * too long. > > > + */ > > > +void wait_for_completion_long_io(struct completion *x) > > > +{ > > > + /* Prevent hang_check timer from firing at us during very long I/O */ > > > + unsigned long timeout = sysctl_hung_task_timeout_secs * HZ / 2; > > > + > > > + if (timeout) > > > + while (!wait_for_completion_io_timeout(x, timeout)) > > > + ; > > > + else > > > + wait_for_completion_io(x); > > > +} > > > +EXPORT_SYMBOL(wait_for_completion_long_io); > > > > Urgh, why is it a sane thing to circumvent the hang check timer? > > The block layer already does it - the bios can have arbitrary size, so > waiting for them takes arbitrary time. Yeah, but now you make it generic and your comment doesn't warn people away, it makes them think this is a sane thing to do.
On Wed, Apr 17, 2024 at 09:57:04PM -0700, Christoph Hellwig wrote: > On Wed, Apr 17, 2024 at 08:00:22PM +0200, Mikulas Patocka wrote: > > > > +EXPORT_SYMBOL(wait_for_completion_long_io); > > > > > > Urgh, why is it a sane thing to circumvent the hang check timer? > > > > The block layer already does it - the bios can have arbitrary size, so > > waiting for them takes arbitrary time. > > And as mentioned the last few times around, I think we want a task > state to say that task can sleep long or even forever and not propagate > this hack even further. A bit like TASK_NOLOAD (which is used to make TASK_IDLE work), but different I suppose. TASK_NOHUNG would be trivial to add ofc. But is it worth it? Anyway, as per the other email, anything like this needs to come with a big fat warning. You get to keep the pieces etc.. --- diff --git a/include/linux/sched.h b/include/linux/sched.h index 3c2abbc587b4..83b25327c233 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -112,7 +112,8 @@ struct user_event_mm; #define TASK_FREEZABLE 0x00002000 #define __TASK_FREEZABLE_UNSAFE (0x00004000 * IS_ENABLED(CONFIG_LOCKDEP)) #define TASK_FROZEN 0x00008000 -#define TASK_STATE_MAX 0x00010000 +#define TASK_NOHUNG 0x00010000 +#define TASK_STATE_MAX 0x00020000 #define TASK_ANY (TASK_STATE_MAX-1) diff --git a/kernel/hung_task.c b/kernel/hung_task.c index b2fc2727d654..126fac835e5e 100644 --- a/kernel/hung_task.c +++ b/kernel/hung_task.c @@ -210,7 +210,8 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout) state = READ_ONCE(t->__state); if ((state & TASK_UNINTERRUPTIBLE) && !(state & TASK_WAKEKILL) && - !(state & TASK_NOLOAD)) + !(state & TASK_NOLOAD) && + !(state & TASK_NOHUNG)) check_hung_task(t, timeout); } unlock:
On Mon, 22 Apr 2024, Peter Zijlstra wrote: > On Wed, Apr 17, 2024 at 09:57:04PM -0700, Christoph Hellwig wrote: > > On Wed, Apr 17, 2024 at 08:00:22PM +0200, Mikulas Patocka wrote: > > > > > +EXPORT_SYMBOL(wait_for_completion_long_io); > > > > > > > > Urgh, why is it a sane thing to circumvent the hang check timer? > > > > > > The block layer already does it - the bios can have arbitrary size, so > > > waiting for them takes arbitrary time. > > > > And as mentioned the last few times around, I think we want a task > > state to say that task can sleep long or even forever and not propagate > > this hack even further. > > A bit like TASK_NOLOAD (which is used to make TASK_IDLE work), but > different I suppose. > > TASK_NOHUNG would be trivial to add ofc. But is it worth it? > > Anyway, as per the other email, anything like this needs to come with a > big fat warning. You get to keep the pieces etc.. This seems better than the blk_wait_io hack. Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> > --- > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 3c2abbc587b4..83b25327c233 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -112,7 +112,8 @@ struct user_event_mm; > #define TASK_FREEZABLE 0x00002000 > #define __TASK_FREEZABLE_UNSAFE (0x00004000 * IS_ENABLED(CONFIG_LOCKDEP)) > #define TASK_FROZEN 0x00008000 > -#define TASK_STATE_MAX 0x00010000 > +#define TASK_NOHUNG 0x00010000 > +#define TASK_STATE_MAX 0x00020000 > > #define TASK_ANY (TASK_STATE_MAX-1) > > diff --git a/kernel/hung_task.c b/kernel/hung_task.c > index b2fc2727d654..126fac835e5e 100644 > --- a/kernel/hung_task.c > +++ b/kernel/hung_task.c > @@ -210,7 +210,8 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout) > state = READ_ONCE(t->__state); > if ((state & TASK_UNINTERRUPTIBLE) && > !(state & TASK_WAKEKILL) && > - !(state & TASK_NOLOAD)) > + !(state & TASK_NOLOAD) && > + !(state & TASK_NOHUNG)) > check_hung_task(t, timeout); > } > unlock: >
On Mon, Apr 22, 2024 at 12:59:56PM +0200, Peter Zijlstra wrote: > A bit like TASK_NOLOAD (which is used to make TASK_IDLE work), but > different I suppose. > > TASK_NOHUNG would be trivial to add ofc. But is it worth it? Yes. And it would allow us to kill the horrible existing block hack.
Index: linux-2.6/block/blk.h =================================================================== --- linux-2.6.orig/block/blk.h 2024-04-17 19:41:14.000000000 +0200 +++ linux-2.6/block/blk.h 2024-04-17 19:41:14.000000000 +0200 @@ -72,18 +72,6 @@ static inline int bio_queue_enter(struct return __bio_queue_enter(q, bio); } -static inline void blk_wait_io(struct completion *done) -{ - /* Prevent hang_check timer from firing at us during very long I/O */ - unsigned long timeout = sysctl_hung_task_timeout_secs * HZ / 2; - - if (timeout) - while (!wait_for_completion_io_timeout(done, timeout)) - ; - else - wait_for_completion_io(done); -} - #define BIO_INLINE_VECS 4 struct bio_vec *bvec_alloc(mempool_t *pool, unsigned short *nr_vecs, gfp_t gfp_mask); Index: linux-2.6/include/linux/completion.h =================================================================== --- linux-2.6.orig/include/linux/completion.h 2024-04-17 19:41:14.000000000 +0200 +++ linux-2.6/include/linux/completion.h 2024-04-17 19:41:14.000000000 +0200 @@ -112,6 +112,7 @@ extern long wait_for_completion_interrup struct completion *x, unsigned long timeout); extern long wait_for_completion_killable_timeout( struct completion *x, unsigned long timeout); +extern void wait_for_completion_long_io(struct completion *x); extern bool try_wait_for_completion(struct completion *x); extern bool completion_done(struct completion *x); Index: linux-2.6/block/bio.c =================================================================== --- linux-2.6.orig/block/bio.c 2024-04-17 19:41:14.000000000 +0200 +++ linux-2.6/block/bio.c 2024-04-17 19:41:14.000000000 +0200 @@ -1378,7 +1378,7 @@ int submit_bio_wait(struct bio *bio) bio->bi_end_io = submit_bio_wait_endio; bio->bi_opf |= REQ_SYNC; submit_bio(bio); - blk_wait_io(&done); + wait_for_completion_long_io(&done); return blk_status_to_errno(bio->bi_status); } Index: linux-2.6/block/blk-mq.c =================================================================== --- linux-2.6.orig/block/blk-mq.c 2024-04-17 19:41:14.000000000 +0200 +++ linux-2.6/block/blk-mq.c 2024-04-17 19:41:14.000000000 +0200 @@ -1407,7 +1407,7 @@ blk_status_t blk_execute_rq(struct reque if (blk_rq_is_poll(rq)) blk_rq_poll_completion(rq, &wait.done); else - blk_wait_io(&wait.done); + wait_for_completion_long_io(&wait.done); return wait.ret; } Index: linux-2.6/kernel/sched/completion.c =================================================================== --- linux-2.6.orig/kernel/sched/completion.c 2024-04-17 19:41:14.000000000 +0200 +++ linux-2.6/kernel/sched/completion.c 2024-04-17 19:41:14.000000000 +0200 @@ -290,6 +290,26 @@ wait_for_completion_killable_timeout(str EXPORT_SYMBOL(wait_for_completion_killable_timeout); /** + * wait_for_completion_long_io - waits for completion of a task + * @x: holds the state of this particular completion + * + * This is like wait_for_completion_io, but it doesn't warn if the wait takes + * too long. + */ +void wait_for_completion_long_io(struct completion *x) +{ + /* Prevent hang_check timer from firing at us during very long I/O */ + unsigned long timeout = sysctl_hung_task_timeout_secs * HZ / 2; + + if (timeout) + while (!wait_for_completion_io_timeout(x, timeout)) + ; + else + wait_for_completion_io(x); +} +EXPORT_SYMBOL(wait_for_completion_long_io); + +/** * try_wait_for_completion - try to decrement a completion without blocking * @x: completion structure *
The block layer has a function blk_wait_io - it works like wait_for_completion_io, except that it doesn't warn if the wait takes too long. This commit renames the function to wait_for_completion_long_io and moves it to kernel/sched/completion.c so that other kernel subsystems can use it. It will be needed by the dm-io subsystem. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- block/bio.c | 2 +- block/blk-mq.c | 2 +- block/blk.h | 12 ------------ include/linux/completion.h | 1 + kernel/sched/completion.c | 20 ++++++++++++++++++++ 5 files changed, 23 insertions(+), 14 deletions(-)