From patchwork Mon Jan 17 08:54:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 12714997 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57D85C4332F for ; Mon, 17 Jan 2022 08:43:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238312AbiAQIn4 (ORCPT ); Mon, 17 Jan 2022 03:43:56 -0500 Received: from szxga01-in.huawei.com ([45.249.212.187]:35846 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233239AbiAQInz (ORCPT ); Mon, 17 Jan 2022 03:43:55 -0500 Received: from kwepemi100010.china.huawei.com (unknown [172.30.72.54]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4JclmG5FY8zccYZ; Mon, 17 Jan 2022 16:43:10 +0800 (CST) Received: from kwepemm600009.china.huawei.com (7.193.23.164) by kwepemi100010.china.huawei.com (7.221.188.54) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Mon, 17 Jan 2022 16:43:53 +0800 Received: from huawei.com (10.175.127.227) by kwepemm600009.china.huawei.com (7.193.23.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Mon, 17 Jan 2022 16:43:53 +0800 From: Yu Kuai To: CC: , , , Subject: [PATCH RESEND 1/3] blk-mq: add new interfaces to track if hctx failed to get driver tag Date: Mon, 17 Jan 2022 16:54:53 +0800 Message-ID: <20220117085455.2269760-2-yukuai3@huawei.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220117085455.2269760-1-yukuai3@huawei.com> References: <20220117085455.2269760-1-yukuai3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemm600009.china.huawei.com (7.193.23.164) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Prepare to allow hardware queue to get more tag while sharing a tag set. Signed-off-by: Yu Kuai --- block/blk-mq-debugfs.c | 2 ++ block/blk-mq-tag.c | 37 +++++++++++++++++++++++++++++++++++++ block/blk-mq-tag.h | 24 +++++++++++++++++++++--- include/linux/blk-mq.h | 12 ++++++++++++ include/linux/blkdev.h | 2 ++ 5 files changed, 74 insertions(+), 3 deletions(-) diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c index 3a790eb4995c..3841fe26cda1 100644 --- a/block/blk-mq-debugfs.c +++ b/block/blk-mq-debugfs.c @@ -454,6 +454,8 @@ static void blk_mq_debugfs_tags_show(struct seq_file *m, seq_printf(m, "nr_reserved_tags=%u\n", tags->nr_reserved_tags); seq_printf(m, "active_queues=%d\n", atomic_read(&tags->active_queues)); + seq_printf(m, "pending_queues=%d\n", + atomic_read(&tags->pending_queues)); seq_puts(m, "\nbitmap_tags:\n"); sbitmap_queue_show(&tags->bitmap_tags, m); diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index e55a6834c9a6..77c723bdfd5c 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -73,6 +73,43 @@ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx) blk_mq_tag_wakeup_all(tags, false); } +/* + * Called when hctx failed to get driver tag + */ +void __blk_mq_dtag_wait(struct blk_mq_hw_ctx *hctx) +{ + if (blk_mq_is_shared_tags(hctx->flags)) { + struct request_queue *q = hctx->queue; + + if (!test_bit(QUEUE_FLAG_HCTX_WAIT, &q->queue_flags) && + !test_and_set_bit(QUEUE_FLAG_HCTX_WAIT, &q->queue_flags)) + atomic_inc(&hctx->tags->pending_queues); + } else { + if (!test_bit(BLK_MQ_S_DTAG_WAIT, &hctx->state) && + !test_and_set_bit(BLK_MQ_S_DTAG_WAIT, &hctx->state)) + atomic_inc(&hctx->tags->pending_queues); + } +} + +/* Called when busy queue goes inactive */ +void __blk_mq_dtag_idle(struct blk_mq_hw_ctx *hctx) +{ + struct blk_mq_tags *tags = hctx->tags; + + if (blk_mq_is_shared_tags(hctx->flags)) { + struct request_queue *q = hctx->queue; + + if (!test_and_clear_bit(QUEUE_FLAG_HCTX_WAIT, + &q->queue_flags)) + return; + } else { + if (!test_and_clear_bit(BLK_MQ_S_DTAG_WAIT, &hctx->state)) + return; + } + + atomic_dec(&tags->pending_queues); +} + static int __blk_mq_get_tag(struct blk_mq_alloc_data *data, struct sbitmap_queue *bt) { diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h index 5668e28be0b7..3fe013aee9a2 100644 --- a/block/blk-mq-tag.h +++ b/block/blk-mq-tag.h @@ -47,15 +47,17 @@ enum { BLK_MQ_TAG_MAX = BLK_MQ_NO_TAG - 1, }; -extern bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *); -extern void __blk_mq_tag_idle(struct blk_mq_hw_ctx *); +extern bool __blk_mq_tag_wait(struct blk_mq_hw_ctx *hctx); +extern void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx); +extern void __blk_mq_dtag_busy(struct blk_mq_hw_ctx *hctx); +extern void __blk_mq_dtag_idle(struct blk_mq_hw_ctx *hctx); static inline bool blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx) { if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) return false; - return __blk_mq_tag_busy(hctx); + return __blk_mq_tag_wait(hctx); } static inline void blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx) @@ -66,6 +68,22 @@ static inline void blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx) __blk_mq_tag_idle(hctx); } +static inline void blk_mq_dtag_wait(struct blk_mq_hw_ctx *hctx) +{ + if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) + return; + + __blk_mq_dtag_wait(hctx); +} + +static inline void blk_mq_dtag_idle(struct blk_mq_hw_ctx *hctx) +{ + if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) + return; + + __blk_mq_dtag_idle(hctx); +} + static inline bool blk_mq_tag_is_reserved(struct blk_mq_tags *tags, unsigned int tag) { diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index d319ffa59354..68b1602d9d60 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -666,6 +666,9 @@ enum { /* hw queue is inactive after all its CPUs become offline */ BLK_MQ_S_INACTIVE = 3, + /* hw queue is waiting for driver tag */ + BLK_MQ_S_DTAG_WAIT = 4, + BLK_MQ_MAX_DEPTH = 10240, BLK_MQ_CPU_WORK_BATCH = 8, @@ -724,7 +727,16 @@ struct blk_mq_tags { unsigned int nr_tags; unsigned int nr_reserved_tags; + /* + * If multiple queues share a tag set, record the number of queues that + * issued io recently. + */ atomic_t active_queues; + /* + * If multiple queues share a tag set, record the number of queues that + * can't get driver tag. + */ + atomic_t pending_queues; struct sbitmap_queue bitmap_tags; struct sbitmap_queue breserved_tags; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 9c95df26fc26..787bfb18ce79 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -412,6 +412,8 @@ struct request_queue { #define QUEUE_FLAG_RQ_ALLOC_TIME 27 /* record rq->alloc_time_ns */ #define QUEUE_FLAG_HCTX_ACTIVE 28 /* at least one blk-mq hctx is active */ #define QUEUE_FLAG_NOWAIT 29 /* device supports NOWAIT */ +#define QUEUE_FLAG_HCTX_WAIT 30 /* at least one blk-mq hctx can't get + driver tag */ #define QUEUE_FLAG_MQ_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \ (1 << QUEUE_FLAG_SAME_COMP) | \ From patchwork Mon Jan 17 08:54:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 12714996 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21DEEC43219 for ; Mon, 17 Jan 2022 08:43:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238342AbiAQIn4 (ORCPT ); Mon, 17 Jan 2022 03:43:56 -0500 Received: from szxga01-in.huawei.com ([45.249.212.187]:35847 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235501AbiAQIn4 (ORCPT ); Mon, 17 Jan 2022 03:43:56 -0500 Received: from kwepemi100008.china.huawei.com (unknown [172.30.72.54]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4JclmH1yNszccYn; Mon, 17 Jan 2022 16:43:11 +0800 (CST) Received: from kwepemm600009.china.huawei.com (7.193.23.164) by kwepemi100008.china.huawei.com (7.221.188.57) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Mon, 17 Jan 2022 16:43:54 +0800 Received: from huawei.com (10.175.127.227) by kwepemm600009.china.huawei.com (7.193.23.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Mon, 17 Jan 2022 16:43:53 +0800 From: Yu Kuai To: CC: , , , Subject: [PATCH RESEND 2/3] blk-mq: record how many hctx failed to get driver tag while sharing a tag set Date: Mon, 17 Jan 2022 16:54:54 +0800 Message-ID: <20220117085455.2269760-3-yukuai3@huawei.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220117085455.2269760-1-yukuai3@huawei.com> References: <20220117085455.2269760-1-yukuai3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemm600009.china.huawei.com (7.193.23.164) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org The hctx will be recorded when getting driver tag failed, and will be cleared when hctx becomes idle. The clearing is too late and can be optimized, however, let's just use the easy way here for now because clearing it in time seems rather complicated. Signed-off-by: Yu Kuai --- block/blk-mq-tag.c | 8 +++++++- block/blk-mq-tag.h | 6 +++--- block/blk-mq.c | 13 ++++++++++--- 3 files changed, 20 insertions(+), 7 deletions(-) diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 77c723bdfd5c..d4d212c6c32e 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -163,8 +163,11 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) if (tag != BLK_MQ_NO_TAG) goto found_tag; - if (data->flags & BLK_MQ_REQ_NOWAIT) + if (data->flags & BLK_MQ_REQ_NOWAIT) { + if (!data->q->elevator) + blk_mq_dtag_wait(data->hctx); return BLK_MQ_NO_TAG; + } ws = bt_wait_ptr(bt, data->hctx); do { @@ -191,6 +194,9 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data) if (tag != BLK_MQ_NO_TAG) break; + if (!data->q->elevator) + blk_mq_dtag_wait(data->hctx); + bt_prev = bt; io_schedule(); diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h index 3fe013aee9a2..d5f98a3e6f91 100644 --- a/block/blk-mq-tag.h +++ b/block/blk-mq-tag.h @@ -47,9 +47,9 @@ enum { BLK_MQ_TAG_MAX = BLK_MQ_NO_TAG - 1, }; -extern bool __blk_mq_tag_wait(struct blk_mq_hw_ctx *hctx); +extern bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx); extern void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx); -extern void __blk_mq_dtag_busy(struct blk_mq_hw_ctx *hctx); +extern void __blk_mq_dtag_wait(struct blk_mq_hw_ctx *hctx); extern void __blk_mq_dtag_idle(struct blk_mq_hw_ctx *hctx); static inline bool blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx) @@ -57,7 +57,7 @@ static inline bool blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx) if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) return false; - return __blk_mq_tag_wait(hctx); + return __blk_mq_tag_busy(hctx); } static inline void blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx) diff --git a/block/blk-mq.c b/block/blk-mq.c index d73bc219a7fa..8d90e686ee8b 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1471,8 +1471,10 @@ static void blk_mq_timeout_work(struct work_struct *work) */ queue_for_each_hw_ctx(q, hctx, i) { /* the hctx may be unmapped, so check it here */ - if (blk_mq_hw_queue_mapped(hctx)) + if (blk_mq_hw_queue_mapped(hctx)) { blk_mq_tag_idle(hctx); + blk_mq_dtag_idle(hctx); + } } } blk_queue_exit(q); @@ -1569,8 +1571,10 @@ static bool __blk_mq_alloc_driver_tag(struct request *rq) } tag = __sbitmap_queue_get(bt); - if (tag == BLK_MQ_NO_TAG) + if (tag == BLK_MQ_NO_TAG) { + blk_mq_dtag_wait(rq->mq_hctx); return false; + } rq->tag = tag + tag_offset; return true; @@ -3416,8 +3420,10 @@ static void blk_mq_exit_hctx(struct request_queue *q, { struct request *flush_rq = hctx->fq->flush_rq; - if (blk_mq_hw_queue_mapped(hctx)) + if (blk_mq_hw_queue_mapped(hctx)) { blk_mq_tag_idle(hctx); + blk_mq_dtag_idle(hctx); + } blk_mq_clear_flush_rq_mapping(set->tags[hctx_idx], set->queue_depth, flush_rq); @@ -3743,6 +3749,7 @@ static void queue_set_hctx_shared(struct request_queue *q, bool shared) hctx->flags |= BLK_MQ_F_TAG_QUEUE_SHARED; } else { blk_mq_tag_idle(hctx); + blk_mq_dtag_idle(hctx); hctx->flags &= ~BLK_MQ_F_TAG_QUEUE_SHARED; } } From patchwork Mon Jan 17 08:54:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Kuai X-Patchwork-Id: 12714998 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD608C433EF for ; Mon, 17 Jan 2022 08:44:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238362AbiAQIn7 (ORCPT ); Mon, 17 Jan 2022 03:43:59 -0500 Received: from szxga03-in.huawei.com ([45.249.212.189]:31163 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238330AbiAQIn5 (ORCPT ); Mon, 17 Jan 2022 03:43:57 -0500 Received: from kwepemi500001.china.huawei.com (unknown [172.30.72.55]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4Jcljv0xfNz8wNn; Mon, 17 Jan 2022 16:41:07 +0800 (CST) Received: from kwepemm600009.china.huawei.com (7.193.23.164) by kwepemi500001.china.huawei.com (7.221.188.114) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Mon, 17 Jan 2022 16:43:54 +0800 Received: from huawei.com (10.175.127.227) by kwepemm600009.china.huawei.com (7.193.23.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Mon, 17 Jan 2022 16:43:54 +0800 From: Yu Kuai To: CC: , , , Subject: [PATCH RESEND 3/3] blk-mq: allow hardware queue to get more tag while sharing a tag set Date: Mon, 17 Jan 2022 16:54:55 +0800 Message-ID: <20220117085455.2269760-4-yukuai3@huawei.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220117085455.2269760-1-yukuai3@huawei.com> References: <20220117085455.2269760-1-yukuai3@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemm600009.china.huawei.com (7.193.23.164) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org If there are multiple active queues while sharing a tag set, the avaliable driver tag for each queue is fair share currently. However, we found this way cause performance degradation in our environment: A virtual machine which has 12 scsi disks on the same scsi host, each disk represents a network disk on host machine. In virtual machine, each disk will issue a sg io about every 15s, which will cause active queues to be 12 before the disk is idle(blk_mq_tag_idle() is called), and io performance is bad due to short of driver tag during that time. Thus if there are no hctx ever failed to get driver tag, don't limit the available driver tags as fair share. And if someone do failed to get driver tag, fall back to fair share. Signed-off-by: Yu Kuai --- block/blk-mq.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/block/blk-mq.h b/block/blk-mq.h index 948791ea2a3e..4b059221b265 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -352,6 +352,10 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx, if (bt->sb.depth == 1) return true; + /* Don't use fair share untill some hctx failed to get driver tag */ + if (!atomic_read(&hctx->tags->pending_queues)) + return true; + if (blk_mq_is_shared_tags(hctx->flags)) { struct request_queue *q = hctx->queue;