From patchwork Mon Jul 12 17:27:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12371781 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D219C11F67 for ; Mon, 12 Jul 2021 17:27:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1463C6124C for ; Mon, 12 Jul 2021 17:27:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230033AbhGLRap (ORCPT ); Mon, 12 Jul 2021 13:30:45 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:38014 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235167AbhGLRao (ORCPT ); Mon, 12 Jul 2021 13:30:44 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 929CD2210D; Mon, 12 Jul 2021 17:27:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1626110875; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3/tUVLi/1ehnmWuHgEgGk245sp/nozJ2MMa9O6HKxS0=; b=wFQlj4agPAQH3TQMLGDVSHAJW7SKkF05ySsCgZJr5E8MdZN7VazIkk22lJ4G+WYhTYGwAA LfybVKslljmRq9247glhR081hxXmQgotM0r7Lue2EtRuq9JC/vTuLJzeGwat39Oq1ue3t2 4BJ6qf6Ec/ce5cignOZ747aM6++jKc8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1626110875; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3/tUVLi/1ehnmWuHgEgGk245sp/nozJ2MMa9O6HKxS0=; b=oxi1RYM/5s3pr+8CRAr2bFgJcWhbdW1myMRst1t6eT+s5+DtL49+LDTjF5l2rIHh6GS7/U c9H861zisSatvgAg== Received: from quack2.suse.cz (unknown [10.100.224.230]) by relay2.suse.de (Postfix) with ESMTP id 854D1A3B84; Mon, 12 Jul 2021 17:27:55 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 632C01E0814; Mon, 12 Jul 2021 19:27:55 +0200 (CEST) From: Jan Kara To: Cc: Paolo Valente , Jens Axboe , mkoutny@suse.cz, Jan Kara Subject: [PATCH 1/3] block: Provide icq in request allocation data Date: Mon, 12 Jul 2021 19:27:37 +0200 Message-Id: <20210712172755.2414-1-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210712171146.12231-1-jack@suse.cz> References: <20210712171146.12231-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3457; h=from:subject; bh=C9/rcXsJhybvwsy9P2BOMgpKKdd8LFIsH6x/qoBn5qE=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBg7HuJEv78bjRoD4TE+LDq/t91D9dk+MIUVgVsCKR3 N93t4vSJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYOx7iQAKCRCcnaoHP2RA2daaB/ 0RhK3qmPZ0GDjcYUqrQivuA+WUeWdPVDQ8UCwHb9wSDQm/HPXwr0IsePMS/2+ubOVTKewzU6wVb9eR Srry75Tb0QFkKI/ED+8rzaFIp6NzWMkL/SP8ZjBn6cn7hII//e/OZ9nsHBrEmX17xh2WbwOz0klkbt 3swcj5ROjuW75ZCj/Tjex1PiT5TwSpRTBK05Q7l6G19cxqbQV5iaUaH8Kdo/kiXcuTWF56Lay+7i/+ ZpI5UOsFqbFQHEDhc4HV5IgTQ4bLqnZrYlSXTUnFq6IwzUpbe7a9tp9G1gSnv35G0orLvv2chdn4iR ItwlVWY0mhmg4j+UPuTDGiMcgBYZmo X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently we lookup ICQ only after the request is allocated. However BFQ will want to decide how many scheduler tags it allows a given bfq queue (effectively a process) to consume based on cgroup weight. So lookup ICQ earlier and provide it in struct blk_mq_alloc_data so that BFQ can use it. Signed-off-by: Jan Kara --- block/blk-mq-sched.c | 18 ++++++++++-------- block/blk-mq-sched.h | 3 ++- block/blk-mq.c | 7 ++++--- block/blk-mq.h | 1 + 4 files changed, 17 insertions(+), 12 deletions(-) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index c838d81ac058..3e34f5bb24ae 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -18,9 +18,8 @@ #include "blk-mq-tag.h" #include "blk-wbt.h" -void blk_mq_sched_assign_ioc(struct request *rq) +struct io_cq *blk_mq_sched_lookup_icq(struct request_queue *q) { - struct request_queue *q = rq->q; struct io_context *ioc; struct io_cq *icq; @@ -29,17 +28,20 @@ void blk_mq_sched_assign_ioc(struct request *rq) */ ioc = current->io_context; if (!ioc) - return; + return NULL; spin_lock_irq(&q->queue_lock); icq = ioc_lookup_icq(ioc, q); spin_unlock_irq(&q->queue_lock); + if (icq) + return icq; + return ioc_create_icq(ioc, q, GFP_ATOMIC); +} - if (!icq) { - icq = ioc_create_icq(ioc, q, GFP_ATOMIC); - if (!icq) - return; - } +void blk_mq_sched_assign_ioc(struct request *rq, struct io_cq *icq) +{ + if (!icq) + return; get_io_context(icq->ioc); rq->elv.icq = icq; } diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h index 5246ae040704..4529991e55e6 100644 --- a/block/blk-mq-sched.h +++ b/block/blk-mq-sched.h @@ -7,7 +7,8 @@ #define MAX_SCHED_RQ (16 * BLKDEV_MAX_RQ) -void blk_mq_sched_assign_ioc(struct request *rq); +struct io_cq *blk_mq_sched_lookup_icq(struct request_queue *q); +void blk_mq_sched_assign_ioc(struct request *rq, struct io_cq *icq); bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio, unsigned int nr_segs, struct request **merged_request); diff --git a/block/blk-mq.c b/block/blk-mq.c index 2c4ac51e54eb..b9d83644158f 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -333,9 +333,7 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data, rq->elv.icq = NULL; if (e && e->type->ops.prepare_request) { - if (e->type->icq_cache) - blk_mq_sched_assign_ioc(rq); - + blk_mq_sched_assign_ioc(rq, data->icq); e->type->ops.prepare_request(rq); rq->rq_flags |= RQF_ELVPRIV; } @@ -360,6 +358,9 @@ static struct request *__blk_mq_alloc_request(struct blk_mq_alloc_data *data) data->flags |= BLK_MQ_REQ_NOWAIT; if (e) { + if (!op_is_flush(data->cmd_flags) && e->type->icq_cache && + e->type->ops.prepare_request) + data->icq = blk_mq_sched_lookup_icq(q); /* * Flush/passthrough requests are special and go directly to the * dispatch list. Don't include reserved tags in the diff --git a/block/blk-mq.h b/block/blk-mq.h index d08779f77a26..c502232384c6 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -151,6 +151,7 @@ static inline struct blk_mq_ctx *blk_mq_get_ctx(struct request_queue *q) struct blk_mq_alloc_data { /* input parameter */ struct request_queue *q; + struct io_cq *icq; blk_mq_req_flags_t flags; unsigned int shallow_depth; unsigned int cmd_flags; From patchwork Mon Jul 12 17:27:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12371779 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7735BC07E99 for ; Mon, 12 Jul 2021 17:27:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5B593611CD for ; Mon, 12 Jul 2021 17:27:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235271AbhGLRap (ORCPT ); Mon, 12 Jul 2021 13:30:45 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:57870 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235160AbhGLRao (ORCPT ); Mon, 12 Jul 2021 13:30:44 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 94CBB1FFD4; Mon, 12 Jul 2021 17:27:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1626110875; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TtT6HXecD3OMF37cBv9XBf+I51CDX6Q833fzzz8oFNc=; b=y8WlmqHoOXJOTScu9TpDOx9W6VruyX9i6UsiriWrtDDUYiaFwhOLuBqvEQERJb/ltLq9iz C90l2uBZgatuoufEf0nDyjN8ygLIOhKWS4P7MrOI6RNpj3di980nmopcrxpxwPKcuygZJb Cotg8Hjal4Qxo7n591+vty1VjMxBMb4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1626110875; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TtT6HXecD3OMF37cBv9XBf+I51CDX6Q833fzzz8oFNc=; b=2Y1DdNYgt2C2YQNThTP3Jw7b3jRVUbatsFkuvjhJvxmYzalXU0F+Cel4uZ4NYCHN5O+rIJ A2rdz6gHYPF0o6BA== Received: from quack2.suse.cz (unknown [10.100.224.230]) by relay2.suse.de (Postfix) with ESMTP id 87E62A3B85; Mon, 12 Jul 2021 17:27:55 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 652451F2C73; Mon, 12 Jul 2021 19:27:55 +0200 (CEST) From: Jan Kara To: Cc: Paolo Valente , Jens Axboe , mkoutny@suse.cz, Jan Kara Subject: [PATCH 2/3] bfq: Track number of allocated requests in bfq_entity Date: Mon, 12 Jul 2021 19:27:38 +0200 Message-Id: <20210712172755.2414-2-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210712171146.12231-1-jack@suse.cz> References: <20210712171146.12231-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3252; h=from:subject; bh=rlIMrQse0DymWFIikMEs/b6gQ7juS+LiMlbECXmQMQ8=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBg7HuKVsHyFnKZ3R1UWjiH1GHmy1+zJqsnN/ousr94 UWUqP1aJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYOx7igAKCRCcnaoHP2RA2bjxB/ 4mfP8m6BFOlpF+eb1PnCPBep2N2LZHtGDZFrX5tRqFeh4HlnZNN1zWlRokMIdBN6P8PNpctlMilFrn 6qE+C9ZScJoVMv8HtCw5nzWR9DQ5welYOa33EMrPAwYz2yoy0CDZzjscAKY8bBhi+GcrFGHbNZ0Bs1 K6Z1kKOUZwZPxqfVnbJr33gsx6Xv65vvo0RFAMNOVzO5lJhdI0WGbsjC2v4fLMJd/aQ4ZsbYWl6b3Q ZuWVW9uVltmQQ7FaDSm0HEgaSU3OX+5OQXp+UhRP9eeYFXGG/TeyjoEzg+u5SS+ToTkgk2SeEkmjIc Gg66EJC+jqKanq9HNTbYr4gWcjp7BM X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org When we want to limit number of requests used by each bfqq and also cgroup, we need to track also number of requests used by each cgroup. So track number of allocated requests for each bfq_entity. Signed-off-by: Jan Kara --- block/bfq-iosched.c | 28 ++++++++++++++++++++++------ block/bfq-iosched.h | 5 +++-- 2 files changed, 25 insertions(+), 8 deletions(-) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 727955918563..9ef057dc0028 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -1113,7 +1113,8 @@ bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_data *bfqd, static int bfqq_process_refs(struct bfq_queue *bfqq) { - return bfqq->ref - bfqq->allocated - bfqq->entity.on_st_or_in_serv - + return bfqq->ref - bfqq->entity.allocated - + bfqq->entity.on_st_or_in_serv - (bfqq->weight_counter != NULL) - bfqq->stable_ref; } @@ -5875,6 +5876,22 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq, } } +static void bfqq_request_allocated(struct bfq_queue *bfqq) +{ + struct bfq_entity *entity = &bfqq->entity; + + for_each_entity(entity) + entity->allocated++; +} + +static void bfqq_request_freed(struct bfq_queue *bfqq) +{ + struct bfq_entity *entity = &bfqq->entity; + + for_each_entity(entity) + entity->allocated--; +} + /* returns true if it causes the idle timer to be disabled */ static bool __bfq_insert_request(struct bfq_data *bfqd, struct request *rq) { @@ -5888,8 +5905,8 @@ static bool __bfq_insert_request(struct bfq_data *bfqd, struct request *rq) * Release the request's reference to the old bfqq * and make sure one is taken to the shared queue. */ - new_bfqq->allocated++; - bfqq->allocated--; + bfqq_request_allocated(new_bfqq); + bfqq_request_freed(bfqq); new_bfqq->ref++; /* * If the bic associated with the process @@ -6248,8 +6265,7 @@ static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd) static void bfq_finish_requeue_request_body(struct bfq_queue *bfqq) { - bfqq->allocated--; - + bfqq_request_freed(bfqq); bfq_put_queue(bfqq); } @@ -6669,7 +6685,7 @@ static struct bfq_queue *bfq_init_rq(struct request *rq) } } - bfqq->allocated++; + bfqq_request_allocated(bfqq); bfqq->ref++; bfq_log_bfqq(bfqd, bfqq, "get_request %p: bfqq %p, %d", rq, bfqq, bfqq->ref); diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h index 99c2a3cb081e..70d4a9b54613 100644 --- a/block/bfq-iosched.h +++ b/block/bfq-iosched.h @@ -170,6 +170,9 @@ struct bfq_entity { /* budget, used also to calculate F_i: F_i = S_i + @budget / @weight */ int budget; + /* Number of requests allocated in the subtree of this entity */ + int allocated; + /* device weight, if non-zero, it overrides the default weight of * bfq_group_data */ int dev_weight; @@ -266,8 +269,6 @@ struct bfq_queue { struct request *next_rq; /* number of sync and async requests queued */ int queued[2]; - /* number of requests currently allocated */ - int allocated; /* number of pending metadata requests */ int meta_pending; /* fifo list of requests in sort_list */ From patchwork Mon Jul 12 17:27:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12371783 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EAEA0C11F66 for ; Mon, 12 Jul 2021 17:27:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D688F61222 for ; Mon, 12 Jul 2021 17:27:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234728AbhGLRap (ORCPT ); Mon, 12 Jul 2021 13:30:45 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:57866 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230033AbhGLRao (ORCPT ); Mon, 12 Jul 2021 13:30:44 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 8F4E21FD5F; Mon, 12 Jul 2021 17:27:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1626110875; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=U/JvUq8Nxd1rWCAL3yyFE7fFdgyAuDisT3eFOL7/cLI=; b=pbR3wyZmqzNBTdzUXuVEjpbJ+zyir4KrMfan6CFtclTy9We4Gwh0kW7P4tUZsrfWpnCSSq uSVZtJiN44hhTEqhI9yBpl1a+5TPOS8MM9NAKQORZ3qkDA3eyUjhiSFhzk2HgfiEc6McV2 jyimq4EA8gpuYTOfdvAh0igvEiuVebY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1626110875; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=U/JvUq8Nxd1rWCAL3yyFE7fFdgyAuDisT3eFOL7/cLI=; b=MN2jla+/xy7lLZd313Z2dUhTrj4nTETrBRbzme9Xin0aIThAy8bObH5NfCOpWxD9RctSp5 Y/iTdADmSWGeVEAA== Received: from quack2.suse.cz (unknown [10.100.224.230]) by relay2.suse.de (Postfix) with ESMTP id 81CBDA3B83; Mon, 12 Jul 2021 17:27:55 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 6B0DA1F2CCB; Mon, 12 Jul 2021 19:27:55 +0200 (CEST) From: Jan Kara To: Cc: Paolo Valente , Jens Axboe , mkoutny@suse.cz, Jan Kara Subject: [PATCH 3/3] bfq: Limit number of requests consumed by each cgroup Date: Mon, 12 Jul 2021 19:27:39 +0200 Message-Id: <20210712172755.2414-3-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210712171146.12231-1-jack@suse.cz> References: <20210712171146.12231-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=5037; h=from:subject; bh=3Ulr6KjQkchbARmJnlEme3XKQ6fHa9z5THtoJJg2tpg=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBg7HuKtutHAcm0A81XiLKHy74IvfkxLDE2C8W88bwP +EG5DX+JATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYOx7igAKCRCcnaoHP2RA2e6RCA DuajF2wB5t8j1uA9/kHxs/bJZ9OThm4xkRCd/7nHArL6ShffzF/R9a7cIhcFOWPFWlC50eEJC/SPBu cWdeo3NysCDBwrHXju31//Bu82yPrzWHmp34W8YNKk5Nmeu3d6OLfUYArmK20YTPXP7vtPrhsf70tq aye6e1MowEC6+LK7/wndooNJ9wM4vS0hMJR/o5/cdiM6YwyB7/ggQzEwD8htm5X65er0s5JO4KEELy S9z5RkKjmr6wM0yNQYpLpFm7joi/8RpjLnkKxdL8VjqJ/+y027LOTOHOc+BibzlixrRFfPDZsjk3bO frhC1vEpTNlcIeIxF27kM0+QuTlNNh X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org When cgroup IO scheduling is used with BFQ it does not really provide service differentiation if the cgroup drives a big IO depth. That for example happens with writeback which asynchronously submits lots of IO but it can happen with AIO as well. The problem is that if we have two cgroups that submit IO with different weights, the cgroup with higher weight properly gets more IO time and is able to dispatch more IO. However this causes lower weight cgroup to accumulate more requests inside BFQ and eventually lower weight cgroup consumes most of IO scheduler tags. At that point higher weight cgroup stops getting better service as it is mostly blocked waiting for a scheduler tag while its queues inside BFQ are empty and thus lower weight cgroup gets served. Check how many requests submitting cgroup has allocated in bfq_limit_depth() and if it consumes more requests than what would correspond to its weight limit available depth to 1 so that the cgroup cannot consume many more requests. With this limitation the higher weight cgroup gets proper service even with writeback. Signed-off-by: Jan Kara --- block/bfq-iosched.c | 54 ++++++++++++++++++++++++++++++--------------- 1 file changed, 36 insertions(+), 18 deletions(-) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 9ef057dc0028..fad54c11c43f 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -565,6 +565,22 @@ static struct request *bfq_choose_req(struct bfq_data *bfqd, } } +static bool bfqq_request_over_limit(struct bfq_queue *bfqq, int limit) +{ + struct bfq_entity *entity = &bfqq->entity; + + for_each_entity(entity) { + if (entity->on_st_or_in_serv && + entity->allocated >= limit * entity->weight / + bfq_entity_service_tree(entity)->wsum) { + bfq_log_bfqq(bfqq->bfqd, bfqq, "too many requests: allocated %d limit %d weight %d wsum %lu", + entity->allocated, limit, entity->weight, bfq_entity_service_tree(entity)->wsum); + return true; + } + } + return false; +} + /* * Async I/O can easily starve sync I/O (both sync reads and sync * writes), by consuming all tags. Similarly, storms of sync writes, @@ -575,16 +591,28 @@ static struct request *bfq_choose_req(struct bfq_data *bfqd, static void bfq_limit_depth(unsigned int op, struct blk_mq_alloc_data *data) { struct bfq_data *bfqd = data->q->elevator->elevator_data; + struct bfq_io_cq *bic = data->icq ? icq_to_bic(data->icq) : NULL; + struct bfq_queue *bfqq = bic ? bic_to_bfqq(bic, op_is_sync(op)) : NULL; + int depth; + /* Sync reads have full depth available */ if (op_is_sync(op) && !op_is_write(op)) - return; + depth = 0; + else + depth = bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)]; - data->shallow_depth = - bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)]; + /* + * Does queue (or any parent entity) exceed number of requests that + * should be available to it? Heavily limit depth so that it cannot + * consume more available requests and thus starve other entities. + */ + if (bfqq && bfqq_request_over_limit(bfqq, data->q->nr_requests)) + depth = 1; bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u", - __func__, bfqd->wr_busy_queues, op_is_sync(op), - data->shallow_depth); + __func__, bfqd->wr_busy_queues, op_is_sync(op), depth); + if (depth) + data->shallow_depth = depth; } static struct bfq_queue * @@ -6848,11 +6876,8 @@ void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg) * See the comments on bfq_limit_depth for the purpose of * the depths set in the function. Return minimum shallow depth we'll use. */ -static unsigned int bfq_update_depths(struct bfq_data *bfqd, - struct sbitmap_queue *bt) +static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt) { - unsigned int i, j, min_shallow = UINT_MAX; - /* * In-word depths if no bfq_queue is being weight-raised: * leaving 25% of tags only for sync reads. @@ -6883,22 +6908,15 @@ static unsigned int bfq_update_depths(struct bfq_data *bfqd, bfqd->word_depths[1][0] = max(((1U << bt->sb.shift) * 3) >> 4, 1U); /* no more than ~37% of tags for sync writes (~20% extra tags) */ bfqd->word_depths[1][1] = max(((1U << bt->sb.shift) * 6) >> 4, 1U); - - for (i = 0; i < 2; i++) - for (j = 0; j < 2; j++) - min_shallow = min(min_shallow, bfqd->word_depths[i][j]); - - return min_shallow; } static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx) { struct bfq_data *bfqd = hctx->queue->elevator->elevator_data; struct blk_mq_tags *tags = hctx->sched_tags; - unsigned int min_shallow; - min_shallow = bfq_update_depths(bfqd, tags->bitmap_tags); - sbitmap_queue_min_shallow_depth(tags->bitmap_tags, min_shallow); + bfq_update_depths(bfqd, tags->bitmap_tags); + sbitmap_queue_min_shallow_depth(tags->bitmap_tags, 1); } static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)