From patchwork Sat Sep 30 10:27:20 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ming Lei X-Patchwork-Id: 9979381 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 5149760327 for ; Sat, 30 Sep 2017 10:29:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3F17E2962E for ; Sat, 30 Sep 2017 10:29:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 30C0B29638; Sat, 30 Sep 2017 10:29:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8BCCD2962E for ; Sat, 30 Sep 2017 10:29:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752957AbdI3K3n (ORCPT ); Sat, 30 Sep 2017 06:29:43 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55648 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752950AbdI3K3l (ORCPT ); Sat, 30 Sep 2017 06:29:41 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E1C58C049E31; Sat, 30 Sep 2017 10:29:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com E1C58C049E31 Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=ming.lei@redhat.com Received: from localhost (ovpn-12-31.pek2.redhat.com [10.72.12.31]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4560E51C66; Sat, 30 Sep 2017 10:29:27 +0000 (UTC) From: Ming Lei To: Jens Axboe , linux-block@vger.kernel.org, Christoph Hellwig , Mike Snitzer , dm-devel@redhat.com Cc: Bart Van Assche , Laurence Oberman , Paolo Valente , Oleksandr Natalenko , Tom Nguyen , linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org, Omar Sandoval , Ming Lei Subject: [PATCH V5 7/7] blk-mq-sched: don't dequeue request until all in ->dispatch are flushed Date: Sat, 30 Sep 2017 18:27:20 +0800 Message-Id: <20170930102720.30219-8-ming.lei@redhat.com> In-Reply-To: <20170930102720.30219-1-ming.lei@redhat.com> References: <20170930102720.30219-1-ming.lei@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Sat, 30 Sep 2017 10:29:41 +0000 (UTC) Sender: linux-scsi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-scsi@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP During dispatching, we moved all requests from hctx->dispatch to one temporary list, then dispatch them one by one from this list. Unfortunately during this period, run queue from other contexts may think the queue is idle, then start to dequeue from sw/scheduler queue and still try to dispatch because ->dispatch is empty. This way hurts sequential I/O performance because requests are dequeued when lld queue is busy. This patch introduces the state of BLK_MQ_S_DISPATCH_BUSY to make sure that request isn't dequeued until ->dispatch is flushed. Reviewed-by: Bart Van Assche Tested-by: Oleksandr Natalenko Tested-by: Tom Nguyen Tested-by: Paolo Valente Signed-off-by: Ming Lei Reviewed-by: Christoph Hellwig --- block/blk-mq-debugfs.c | 1 + block/blk-mq-sched.c | 53 +++++++++++++++++++++++++++++++++----------------- block/blk-mq.c | 6 ++++++ include/linux/blk-mq.h | 1 + 4 files changed, 43 insertions(+), 18 deletions(-) diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 813ca3bbbefc..f1a62c0d1acc 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -182,6 +182,7 @@ static const char *const hctx_state_name[] = { HCTX_STATE_NAME(SCHED_RESTART), HCTX_STATE_NAME(TAG_WAITING), HCTX_STATE_NAME(START_ON_RUN), + HCTX_STATE_NAME(DISPATCH_BUSY), }; #undef HCTX_STATE_NAME diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 3ba112d9dc15..c5eac1eee442 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -146,7 +146,6 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) struct request_queue *q = hctx->queue; struct elevator_queue *e = q->elevator; const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request; - bool do_sched_dispatch = true; LIST_HEAD(rq_list); /* RCU or SRCU read lock is needed before checking quiesced flag */ @@ -177,8 +176,33 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) */ if (!list_empty(&rq_list)) { blk_mq_sched_mark_restart_hctx(hctx); - do_sched_dispatch = blk_mq_dispatch_rq_list(q, &rq_list); - } else if (!has_sched_dispatch && !q->queue_depth) { + blk_mq_dispatch_rq_list(q, &rq_list); + + /* + * We may clear DISPATCH_BUSY just after it + * is set from another context, the only cost + * is that one request is dequeued a bit early, + * we can survive that. Given the window is + * small enough, no need to worry about performance + * effect. + */ + if (list_empty_careful(&hctx->dispatch)) + clear_bit(BLK_MQ_S_DISPATCH_BUSY, &hctx->state); + } + + /* + * If DISPATCH_BUSY is set, that means hw queue is busy + * and requests in the list of hctx->dispatch need to + * be flushed first, so return early. + * + * Wherever DISPATCH_BUSY is set, blk_mq_run_hw_queue() + * will be run to try to make progress, so it is always + * safe to check the state here. + */ + if (test_bit(BLK_MQ_S_DISPATCH_BUSY, &hctx->state)) + return; + + if (!has_sched_dispatch) { /* * If there is no per-request_queue depth, we * flush all requests in this hw queue, otherwise @@ -187,22 +211,15 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) * run out of resource, which can be triggered * easily by per-request_queue queue depth */ - blk_mq_flush_busy_ctxs(hctx, &rq_list); - blk_mq_dispatch_rq_list(q, &rq_list); - } - - if (!do_sched_dispatch) - return; - - /* - * We want to dispatch from the scheduler if there was nothing - * on the dispatch list or we were able to dispatch from the - * dispatch list. - */ - if (has_sched_dispatch) + if (!q->queue_depth) { + blk_mq_flush_busy_ctxs(hctx, &rq_list); + blk_mq_dispatch_rq_list(q, &rq_list); + } else { + blk_mq_do_dispatch_ctx(q, hctx); + } + } else { blk_mq_do_dispatch_sched(q, e, hctx); - else - blk_mq_do_dispatch_ctx(q, hctx); + } } bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio, diff --git a/block/blk-mq.c b/block/blk-mq.c index 8b49af1ade7f..7cb3f87334c0 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1142,6 +1142,11 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list) spin_lock(&hctx->lock); list_splice_init(list, &hctx->dispatch); + /* + * DISPATCH_BUSY won't be cleared until all requests + * in hctx->dispatch are dispatched successfully + */ + set_bit(BLK_MQ_S_DISPATCH_BUSY, &hctx->state); spin_unlock(&hctx->lock); /* @@ -1446,6 +1451,7 @@ static void blk_mq_request_direct_insert(struct blk_mq_hw_ctx *hctx, { spin_lock(&hctx->lock); list_add_tail(&rq->queuelist, &hctx->dispatch); + set_bit(BLK_MQ_S_DISPATCH_BUSY, &hctx->state); spin_unlock(&hctx->lock); blk_mq_run_hw_queue(hctx, false); diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index fccabe00fb55..aa9853ada8b8 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -172,6 +172,7 @@ enum { BLK_MQ_S_SCHED_RESTART = 2, BLK_MQ_S_TAG_WAITING = 3, BLK_MQ_S_START_ON_RUN = 4, + BLK_MQ_S_DISPATCH_BUSY = 5, BLK_MQ_MAX_DEPTH = 10240,