Message ID | 20211213054425.28121-1-dave@stgolabs.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blk-mq: make synchronous hw_queue runs RT friendly | expand |
> +#ifndef CONFIG_PREEMPT_RT Please don't add these silly inverted ifdefs. > +static inline void blk_mq_start_sync_run_hw_queue(void) > +{ > + preempt_disable(); > +} > +static inline void blk_mq_end_sync_run_hw_queue(void) > +{ > + preempt_enable(); > +} > +#else > +static inline void blk_mq_start_sync_run_hw_queue(void) > +{ > + migrate_disable(); > +} > +static inline void blk_mq_end_sync_run_hw_queue(void) > +{ > + migrate_enable(); > +} > +#endif But more importantly: why isn't migrate_disable/enable doing the right thing for !PREEMPT_RT to avoid this mess?
On 2021-12-13 05:07:04 [-0800], Christoph Hellwig wrote: > But more importantly: why isn't migrate_disable/enable doing the right > thing for !PREEMPT_RT to avoid this mess? Thank you for asking the question. Sebastian
On Mon, 13 Dec 2021, Christoph Hellwig wrote: >But more importantly: why isn't migrate_disable/enable doing the right >thing for !PREEMPT_RT to avoid this mess? Please see Peter's description of the situation in af449901b84. While I'm not at all a fan of sprinkling migrate_disabling around code, I didn't want to add any overhead for the common case. If this, however, were not an issue (if most cases are async runs, for example) the ideal solution I think would be to just pin current to the hctx->cpumask. Thanks, Davidlohr
On Mon, Dec 13, 2021 at 11:05:29AM -0800, Davidlohr Bueso wrote: > On Mon, 13 Dec 2021, Christoph Hellwig wrote: > > But more importantly: why isn't migrate_disable/enable doing the right > > thing for !PREEMPT_RT to avoid this mess? > > Please see Peter's description of the situation in af449901b84. That explains why migrate_disable is a bad idea in PREEMPT_RT, not why it can't do something sensible for !PREEMPT_RT… > > While I'm not at all a fan of sprinkling migrate_disabling around code, > I didn't want to add any overhead for the common case. If this, however, > were not an issue (if most cases are async runs, for example) the ideal > solution I think would be to just pin current to the hctx->cpumask. sync running is the performance case.
diff --git a/block/blk-mq.c b/block/blk-mq.c index 8874a63ae952..d44b851fffba 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1841,6 +1841,30 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) return next_cpu; } +/* + * Mark regions to ensure that a synchronous hardware queue + * runs on a correct CPU. + */ +#ifndef CONFIG_PREEMPT_RT +static inline void blk_mq_start_sync_run_hw_queue(void) +{ + preempt_disable(); +} +static inline void blk_mq_end_sync_run_hw_queue(void) +{ + preempt_enable(); +} +#else +static inline void blk_mq_start_sync_run_hw_queue(void) +{ + migrate_disable(); +} +static inline void blk_mq_end_sync_run_hw_queue(void) +{ + migrate_enable(); +} +#endif + /** * __blk_mq_delay_run_hw_queue - Run (or schedule to run) a hardware queue. * @hctx: Pointer to the hardware queue to run. @@ -1857,14 +1881,14 @@ static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async, return; if (!async && !(hctx->flags & BLK_MQ_F_BLOCKING)) { - int cpu = get_cpu(); - if (cpumask_test_cpu(cpu, hctx->cpumask)) { + blk_mq_start_sync_run_hw_queue(); + if (cpumask_test_cpu(smp_processor_id(), hctx->cpumask)) { __blk_mq_run_hw_queue(hctx); - put_cpu(); + blk_mq_end_sync_run_hw_queue(); return; } - put_cpu(); + blk_mq_end_sync_run_hw_queue(); } kblockd_mod_delayed_work_on(blk_mq_hctx_next_cpu(hctx), &hctx->run_work,
Disabling preemption for the synchronous part of __blk_mq_delay_run_hw_queue() is to ensure that the hw queue runs in the correct CPU. This does not play well with PREEMPT_RT as regular spinlocks can be taken at this time (such as the hctx->lock), triggering scheduling while atomic scenarios. Introduce regions to mark starting and ending such cases and allow RT to instead disable migration. While this actually better documents what is occurring (as it is not about preemption but CPU locality), doing so for the regular non-RT case can be too expensive. Similarly, instead of relying on preemption or migration tricks, the task could also be affined to the valid cpumask, but that too would be unnecessarily expensive. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> --- block/blk-mq.c | 32 ++++++++++++++++++++++++++++---- 1 file changed, 28 insertions(+), 4 deletions(-)