diff mbox series

[V2,1/2] blk-mq: don't queue plugged passthrough requests into scheduler

Message ID 20230515144601.52811-2-ming.lei@redhat.com (mailing list archive)
State New, archived
Headers show
Series blk-mq: handle passthrough request as really passthrough | expand

Commit Message

Ming Lei May 15, 2023, 2:46 p.m. UTC
Passthrough(pt) request shouldn't be queued to scheduler, especially some
schedulers(such as bfq) supposes that req->bio is always available, then
blk-cgroup can be retrieved via bio.

We never let passthrough request cross scheduler before commit 1c2d2fff6dc0
("block: wire-up support for passthrough plugging").

Fix this issue by queuing pt request from plug list to hctx->dispatch
directly.

Reported-by: Guangwu Zhang <guazhang@redhat.com>
Closes: https://lore.kernel.org/linux-block/CAGS2=YosaYaUTEMU3uaf+y=8MqSrhL7sYsJn8EwbaM=76p_4Qg@mail.gmail.com/
Investigated-by: Yu Kuai <yukuai1@huaweicloud.com>
Fixes: 1c2d2fff6dc0 ("block: wire-up support for passthrough plugging")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

Comments

Christoph Hellwig May 16, 2023, 6:22 a.m. UTC | #1
On Mon, May 15, 2023 at 10:46:00PM +0800, Ming Lei wrote:
> +		} else if (this_hctx != rq->mq_hctx || this_ctx != rq->mq_ctx ||
> +				pt != blk_rq_is_passthrough(rq)) {

Can your format this as:

		} else if (this_hctx != rq->mq_hctx || this_ctx != rq->mq_ctx ||
			   pt != blk_rq_is_passthrough(rq)) {

for readability?

> +			/*
> +			 * Both passthrough and flush request don't belong to
> +			 * scheduler, but flush request won't be added to plug
> +			 * list, so needn't handle here.
> +			 */
>  			rq_list_add_tail(&requeue_lastp, rq);

This comment confuses the heck out of me.  The check if for passthrough
vs non-passthrough and doesn't involved flush requests at all.

I'd prefer to drop it, and instead comment on passthrough requests
not going to the scheduled below where we actually issue other requests
to the scheduler.

> +	if (pt) {
> +		spin_lock(&this_hctx->lock);
> +		list_splice_tail_init(&list, &this_hctx->dispatch);
> +		spin_unlock(&this_hctx->lock);
> +		blk_mq_run_hw_queue(this_hctx, from_sched);

.. aka here.  But why can't we just use the blk_mq_insert_requests
for this case anyway?

As in just doing a:


-	if (this_hctx->queue->elevator) {
+	if (this_hctx->queue->elevator && !pt) {

?
Ming Lei May 16, 2023, 8:10 a.m. UTC | #2
On Tue, May 16, 2023 at 08:22:21AM +0200, Christoph Hellwig wrote:
> On Mon, May 15, 2023 at 10:46:00PM +0800, Ming Lei wrote:
> > +		} else if (this_hctx != rq->mq_hctx || this_ctx != rq->mq_ctx ||
> > +				pt != blk_rq_is_passthrough(rq)) {
> 
> Can your format this as:
> 
> 		} else if (this_hctx != rq->mq_hctx || this_ctx != rq->mq_ctx ||
> 			   pt != blk_rq_is_passthrough(rq)) {
> 
> for readability?

Do you mean indent for 'pt = blk_rq_is_passthrough(rq)' and keep 'pt' aligned
with 'if' in last line? Otherwise, can't see any difference between the two, :-(

> 
> > +			/*
> > +			 * Both passthrough and flush request don't belong to
> > +			 * scheduler, but flush request won't be added to plug
> > +			 * list, so needn't handle here.
> > +			 */
> >  			rq_list_add_tail(&requeue_lastp, rq);
> 
> This comment confuses the heck out of me.  The check if for passthrough
> vs non-passthrough and doesn't involved flush requests at all.
> 
> I'd prefer to drop it, and instead comment on passthrough requests
> not going to the scheduled below where we actually issue other requests
> to the scheduler.

Any request can be in plug list in theory, we just don't add flush request
to plug, that is why the above comment is added. If you don't like the
words for flush request, I can drop it.

> 
> > +	if (pt) {
> > +		spin_lock(&this_hctx->lock);
> > +		list_splice_tail_init(&list, &this_hctx->dispatch);
> > +		spin_unlock(&this_hctx->lock);
> > +		blk_mq_run_hw_queue(this_hctx, from_sched);
> 
> .. aka here.  But why can't we just use the blk_mq_insert_requests
> for this case anyway?

If the pt request is part of error recovery, it should be issued to
->dispatch list directly, so just for the sake of safety, meantime keep
same behavior with blk_mq_insert_request().

Thanks,
Ming
Christoph Hellwig May 17, 2023, 7:20 a.m. UTC | #3
On Tue, May 16, 2023 at 04:10:01PM +0800, Ming Lei wrote:
> On Tue, May 16, 2023 at 08:22:21AM +0200, Christoph Hellwig wrote:
> > On Mon, May 15, 2023 at 10:46:00PM +0800, Ming Lei wrote:
> > > +		} else if (this_hctx != rq->mq_hctx || this_ctx != rq->mq_ctx ||
> > > +				pt != blk_rq_is_passthrough(rq)) {
> > 
> > Can your format this as:
> > 
> > 		} else if (this_hctx != rq->mq_hctx || this_ctx != rq->mq_ctx ||
> > 			   pt != blk_rq_is_passthrough(rq)) {
> > 
> > for readability?
> 
> Do you mean indent for 'pt = blk_rq_is_passthrough(rq)' and keep 'pt' aligned
> with 'if' in last line?

Yes.

> > This comment confuses the heck out of me.  The check if for passthrough
> > vs non-passthrough and doesn't involved flush requests at all.
> > 
> > I'd prefer to drop it, and instead comment on passthrough requests
> > not going to the scheduled below where we actually issue other requests
> > to the scheduler.
> 
> Any request can be in plug list in theory, we just don't add flush request
> to plug, that is why the above comment is added. If you don't like the
> words for flush request, I can drop it.

I just don't think it maks any sense in this context.  If we want to
enforce the invariant that there's no flush request I'd rather add a
WARN_ON to not only talk about enforce it.  I'm not sure it's really
required, though.

> > > +	if (pt) {
> > > +		spin_lock(&this_hctx->lock);
> > > +		list_splice_tail_init(&list, &this_hctx->dispatch);
> > > +		spin_unlock(&this_hctx->lock);
> > > +		blk_mq_run_hw_queue(this_hctx, from_sched);
> > 
> > .. aka here.  But why can't we just use the blk_mq_insert_requests
> > for this case anyway?
> 
> If the pt request is part of error recovery, it should be issued to
> ->dispatch list directly, so just for the sake of safety, meantime keep
> same behavior with blk_mq_insert_request().

But if it is part of error recovery it won't be plugged.  Please don't
do weird cargo cult things here and just use the common helpers.
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f6dad0886a2f..b4aaf42f1125 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2711,6 +2711,7 @@  static void blk_mq_dispatch_plug_list(struct blk_plug *plug, bool from_sched)
 	struct request *requeue_list = NULL;
 	struct request **requeue_lastp = &requeue_list;
 	unsigned int depth = 0;
+	bool pt = false;
 	LIST_HEAD(list);
 
 	do {
@@ -2719,7 +2720,14 @@  static void blk_mq_dispatch_plug_list(struct blk_plug *plug, bool from_sched)
 		if (!this_hctx) {
 			this_hctx = rq->mq_hctx;
 			this_ctx = rq->mq_ctx;
-		} else if (this_hctx != rq->mq_hctx || this_ctx != rq->mq_ctx) {
+			pt = blk_rq_is_passthrough(rq);
+		} else if (this_hctx != rq->mq_hctx || this_ctx != rq->mq_ctx ||
+				pt != blk_rq_is_passthrough(rq)) {
+			/*
+			 * Both passthrough and flush request don't belong to
+			 * scheduler, but flush request won't be added to plug
+			 * list, so needn't handle here.
+			 */
 			rq_list_add_tail(&requeue_lastp, rq);
 			continue;
 		}
@@ -2731,7 +2739,13 @@  static void blk_mq_dispatch_plug_list(struct blk_plug *plug, bool from_sched)
 	trace_block_unplug(this_hctx->queue, depth, !from_sched);
 
 	percpu_ref_get(&this_hctx->queue->q_usage_counter);
-	if (this_hctx->queue->elevator) {
+	/* passthrough requests don't belong to scheduler */
+	if (pt) {
+		spin_lock(&this_hctx->lock);
+		list_splice_tail_init(&list, &this_hctx->dispatch);
+		spin_unlock(&this_hctx->lock);
+		blk_mq_run_hw_queue(this_hctx, from_sched);
+	} else if (this_hctx->queue->elevator) {
 		this_hctx->queue->elevator->type->ops.insert_requests(this_hctx,
 				&list, 0);
 		blk_mq_run_hw_queue(this_hctx, from_sched);