From patchwork Wed Oct 6 17:31:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12539935 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B922C433F5 for ; Wed, 6 Oct 2021 17:32:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 66EF4610C8 for ; Wed, 6 Oct 2021 17:32:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238932AbhJFRd5 (ORCPT ); Wed, 6 Oct 2021 13:33:57 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:41960 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238424AbhJFRdv (ORCPT ); Wed, 6 Oct 2021 13:33:51 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id CF80C1FEE8; Wed, 6 Oct 2021 17:31:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1633541517; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q47AY5VQrvEGE/dtkE+SbOD8ZnK0Nig+WIn3fo5CDsc=; b=0Q+0aB5nkDd4OIQR/NdffdXTgDNpqRu9a9Y11WLdMXWrmC0AiY/XA4s9eqpWyEP9PTxtm8 usjmzP5msVQhFH/5rcuroIYV72WVKv7gYe7603hAXtcLLPqtfIX/zMLs9nyn7ZWHn7mhOs w9ZeyencsQINFwsOyhkHP3gUuCTrVug= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1633541517; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q47AY5VQrvEGE/dtkE+SbOD8ZnK0Nig+WIn3fo5CDsc=; b=lGd5hUc5mt7u3TRJS3q+FuUYuGFrr4kXjAdC6bJOrhwariu3ctf7oDmpCUpcPCTf3BpKEI ddILi8MjEPP7v+CA== Received: from quack2.suse.cz (unknown [10.163.28.18]) by relay2.suse.de (Postfix) with ESMTP id B75E4A3B8B; Wed, 6 Oct 2021 17:31:57 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 5D7F01E0BEA; Wed, 6 Oct 2021 19:31:57 +0200 (CEST) From: Jan Kara To: Paolo Valente Cc: , Jens Axboe , =?utf-8?q?M?= =?utf-8?q?ichal_Koutn=C3=BD?= , Jan Kara Subject: [PATCH 1/8] block: Provide icq in request allocation data Date: Wed, 6 Oct 2021 19:31:40 +0200 Message-Id: <20211006173157.6906-1-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20211006164110.10817-1-jack@suse.cz> References: <20211006164110.10817-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3457; h=from:subject; bh=BYzZGv61PbCxlfaIVFX4SHKjjCdlJNJtgeK7qNnI2NI=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBhXd18lT2/RHsaVreMm4V3vtk5IUcUrST0OFYBRNfG Ih25lM+JATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYV3dfAAKCRCcnaoHP2RA2avUB/ 4nWd9yBmbmj6eYXrcvjvAFyT/+EBqkQlTC9yFzN0x9VnWlpxqs/+1QVol7N+t/eOqNGxo37PzIrwn9 JjyzAOXGZIEErFKyQeQxVTD94ufB3TWjMGLhv9xkaISIS8TBHnTd81EgHVxO4qHptP4RbNpEX2n/z+ 2eXQL8/OfSI701nj4yVPkAke6LelZPDzQJjlZ0p/WIPVH2nDrO0Ga3b6d4BO6hze7WwoxqARKFGz5/ H/cJzxppNmYUz1IFCzpf7glQ2VIejIZdW68WKkUdr0j/SAaOBQw+0F2d2d4lT90Cs97XatipfIuFM0 pVdG8hDZWgsfMZjwRejlPBU52aE7tR X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently we lookup ICQ only after the request is allocated. However BFQ will want to decide how many scheduler tags it allows a given bfq queue (effectively a process) to consume based on cgroup weight. So lookup ICQ earlier and provide it in struct blk_mq_alloc_data so that BFQ can use it. Signed-off-by: Jan Kara --- block/blk-mq-sched.c | 18 ++++++++++-------- block/blk-mq-sched.h | 3 ++- block/blk-mq.c | 7 ++++--- block/blk-mq.h | 1 + 4 files changed, 17 insertions(+), 12 deletions(-) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 0f006cabfd91..bbb6a677fdde 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -18,9 +18,8 @@ #include "blk-mq-tag.h" #include "blk-wbt.h" -void blk_mq_sched_assign_ioc(struct request *rq) +struct io_cq *blk_mq_sched_lookup_icq(struct request_queue *q) { - struct request_queue *q = rq->q; struct io_context *ioc; struct io_cq *icq; @@ -29,17 +28,20 @@ void blk_mq_sched_assign_ioc(struct request *rq) */ ioc = current->io_context; if (!ioc) - return; + return NULL; spin_lock_irq(&q->queue_lock); icq = ioc_lookup_icq(ioc, q); spin_unlock_irq(&q->queue_lock); + if (icq) + return icq; + return ioc_create_icq(ioc, q, GFP_ATOMIC); +} - if (!icq) { - icq = ioc_create_icq(ioc, q, GFP_ATOMIC); - if (!icq) - return; - } +void blk_mq_sched_assign_ioc(struct request *rq, struct io_cq *icq) +{ + if (!icq) + return; get_io_context(icq->ioc); rq->elv.icq = icq; } diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h index 5246ae040704..4529991e55e6 100644 --- a/block/blk-mq-sched.h +++ b/block/blk-mq-sched.h @@ -7,7 +7,8 @@ #define MAX_SCHED_RQ (16 * BLKDEV_MAX_RQ) -void blk_mq_sched_assign_ioc(struct request *rq); +struct io_cq *blk_mq_sched_lookup_icq(struct request_queue *q); +void blk_mq_sched_assign_ioc(struct request *rq, struct io_cq *icq); bool blk_mq_sched_try_merge(struct request_queue *q, struct bio *bio, unsigned int nr_segs, struct request **merged_request); diff --git a/block/blk-mq.c b/block/blk-mq.c index 108a352051be..bf7dfd36d327 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -333,9 +333,7 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data, rq->elv.icq = NULL; if (e && e->type->ops.prepare_request) { - if (e->type->icq_cache) - blk_mq_sched_assign_ioc(rq); - + blk_mq_sched_assign_ioc(rq, data->icq); e->type->ops.prepare_request(rq); rq->rq_flags |= RQF_ELVPRIV; } @@ -360,6 +358,9 @@ static struct request *__blk_mq_alloc_request(struct blk_mq_alloc_data *data) data->flags |= BLK_MQ_REQ_NOWAIT; if (e) { + if (!op_is_flush(data->cmd_flags) && e->type->icq_cache && + e->type->ops.prepare_request) + data->icq = blk_mq_sched_lookup_icq(q); /* * Flush/passthrough requests are special and go directly to the * dispatch list. Don't include reserved tags in the diff --git a/block/blk-mq.h b/block/blk-mq.h index d08779f77a26..c502232384c6 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -151,6 +151,7 @@ static inline struct blk_mq_ctx *blk_mq_get_ctx(struct request_queue *q) struct blk_mq_alloc_data { /* input parameter */ struct request_queue *q; + struct io_cq *icq; blk_mq_req_flags_t flags; unsigned int shallow_depth; unsigned int cmd_flags; From patchwork Wed Oct 6 17:31:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12539933 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA28FC433EF for ; Wed, 6 Oct 2021 17:32:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 98A1F6117A for ; Wed, 6 Oct 2021 17:32:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234560AbhJFRd5 (ORCPT ); Wed, 6 Oct 2021 13:33:57 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:50248 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238116AbhJFRdv (ORCPT ); Wed, 6 Oct 2021 13:33:51 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id CDA762256C; Wed, 6 Oct 2021 17:31:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1633541517; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8gJc4Kjzq6efjMtvdEW4yNaQaDKWgD8Tt8ivJ20I73w=; b=1dN/lBXxs6dWuQV5Z0UyuX9pLjIofJfyJiB0NzH7rRJgc5z06NDpoOo3Pj9eCL+ZZlP19l Djs7p+NBr7kJfd20xrmEFsQ2xwukwoUhAj0/q38cKHixSk0a5FprDIPqTM+bUsJF6wGnPL WHQGR/U5lUNGC1zkQiJbfTGa7zq2+V0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1633541517; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8gJc4Kjzq6efjMtvdEW4yNaQaDKWgD8Tt8ivJ20I73w=; b=cY5FYQzJ/cYratiq5qtC3vEK99HRcJwj9GS5U78avhCR65q9OkP+oPXP1bDWi4wRo/EaVh qgJgvUNZXx/0zyAA== Received: from quack2.suse.cz (unknown [10.163.28.18]) by relay2.suse.de (Postfix) with ESMTP id B4FBBA3B8A; Wed, 6 Oct 2021 17:31:57 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 61C1C1F2C96; Wed, 6 Oct 2021 19:31:57 +0200 (CEST) From: Jan Kara To: Paolo Valente Cc: , Jens Axboe , =?utf-8?q?M?= =?utf-8?q?ichal_Koutn=C3=BD?= , Jan Kara Subject: [PATCH 2/8] bfq: Track number of allocated requests in bfq_entity Date: Wed, 6 Oct 2021 19:31:41 +0200 Message-Id: <20211006173157.6906-2-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20211006164110.10817-1-jack@suse.cz> References: <20211006164110.10817-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3252; h=from:subject; bh=nE1Pgob+FcOETPoHQx5WX3CJpLvFgGAQQLuIEdwgsqA=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBhXd19HE8X0u2ZnTb9+e8l9dCiIxSg8SttTYmmkkF9 3Z7SlUmJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYV3dfQAKCRCcnaoHP2RA2VQwCA DdSmsq6VMH3kbxqQgtskvD8jGSlmxS9Lat0RqcQMurvfwGE89ytlkpO1dwdfciM5YK85QsNYF+l3lp ZPiL4YsEWHWOqWagRZrlLudaT22lORZoMypMFxdd5RPGAwqrxAjQTQPNBPeT8ZGpSc57Lae4Moh8fq 72g/RRxFB37gFnAXOITorSFmn/k8f5exiUNBZNemqsK9Pr1U/8cVZbkLdCGV2UNi2Ic/oAB67NphqZ 9VxSYuV1WWHzkQknBDiBM22TD1lo0jVjLIAl4VhOuFj2GJxTplpQ9f8nmAm+MCYsTEqTnL2GLR6wXl nTScXrhgNb/ct8spAaGtPwS6LCOg5p X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org When we want to limit number of requests used by each bfqq and also cgroup, we need to track also number of requests used by each cgroup. So track number of allocated requests for each bfq_entity. Signed-off-by: Jan Kara --- block/bfq-iosched.c | 28 ++++++++++++++++++++++------ block/bfq-iosched.h | 5 +++-- 2 files changed, 25 insertions(+), 8 deletions(-) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 480e1a134859..4d9e04edb614 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -1113,7 +1113,8 @@ bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_data *bfqd, static int bfqq_process_refs(struct bfq_queue *bfqq) { - return bfqq->ref - bfqq->allocated - bfqq->entity.on_st_or_in_serv - + return bfqq->ref - bfqq->entity.allocated - + bfqq->entity.on_st_or_in_serv - (bfqq->weight_counter != NULL) - bfqq->stable_ref; } @@ -5878,6 +5879,22 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq, } } +static void bfqq_request_allocated(struct bfq_queue *bfqq) +{ + struct bfq_entity *entity = &bfqq->entity; + + for_each_entity(entity) + entity->allocated++; +} + +static void bfqq_request_freed(struct bfq_queue *bfqq) +{ + struct bfq_entity *entity = &bfqq->entity; + + for_each_entity(entity) + entity->allocated--; +} + /* returns true if it causes the idle timer to be disabled */ static bool __bfq_insert_request(struct bfq_data *bfqd, struct request *rq) { @@ -5891,8 +5908,8 @@ static bool __bfq_insert_request(struct bfq_data *bfqd, struct request *rq) * Release the request's reference to the old bfqq * and make sure one is taken to the shared queue. */ - new_bfqq->allocated++; - bfqq->allocated--; + bfqq_request_allocated(new_bfqq); + bfqq_request_freed(bfqq); new_bfqq->ref++; /* * If the bic associated with the process @@ -6251,8 +6268,7 @@ static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd) static void bfq_finish_requeue_request_body(struct bfq_queue *bfqq) { - bfqq->allocated--; - + bfqq_request_freed(bfqq); bfq_put_queue(bfqq); } @@ -6672,7 +6688,7 @@ static struct bfq_queue *bfq_init_rq(struct request *rq) } } - bfqq->allocated++; + bfqq_request_allocated(bfqq); bfqq->ref++; bfq_log_bfqq(bfqd, bfqq, "get_request %p: bfqq %p, %d", rq, bfqq, bfqq->ref); diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h index a73488eec8a4..3787cfb0febb 100644 --- a/block/bfq-iosched.h +++ b/block/bfq-iosched.h @@ -170,6 +170,9 @@ struct bfq_entity { /* budget, used also to calculate F_i: F_i = S_i + @budget / @weight */ int budget; + /* Number of requests allocated in the subtree of this entity */ + int allocated; + /* device weight, if non-zero, it overrides the default weight of * bfq_group_data */ int dev_weight; @@ -266,8 +269,6 @@ struct bfq_queue { struct request *next_rq; /* number of sync and async requests queued */ int queued[2]; - /* number of requests currently allocated */ - int allocated; /* number of pending metadata requests */ int meta_pending; /* fifo list of requests in sort_list */ From patchwork Wed Oct 6 17:31:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12539925 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF8EFC433EF for ; Wed, 6 Oct 2021 17:32:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A962C610C8 for ; Wed, 6 Oct 2021 17:32:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231678AbhJFRdz (ORCPT ); Wed, 6 Oct 2021 13:33:55 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:50242 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232082AbhJFRdv (ORCPT ); Wed, 6 Oct 2021 13:33:51 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id C89F0223CD; Wed, 6 Oct 2021 17:31:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1633541517; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lvHw3PwW6TDsFi8W0PgH0zwihCZPnGWlWI/UD/BIXxI=; b=nqQ3Ykg+hYj70bYxleDCkje22wEYgzUe5kbYEj0KgxaerUN7/mYmaD9yDbRamZX8gD8b29 hcwee4fSjXXHsPm+Azv9aG3t3uLu5PLBGepj2PSwaF6l9iWuSF7fOHxug6CYf9NkUsUBeq jzkPh6JKw2YhldGYasU3virP+dlorn8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1633541517; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lvHw3PwW6TDsFi8W0PgH0zwihCZPnGWlWI/UD/BIXxI=; b=pci4I5KY7Qahajl6GfgIuRdVgdutUDQppeEDRIa0CRJrgIBljtcZxxIk4svbru4uhbS9qe qSG2x4YckUUjBvCg== Received: from quack2.suse.cz (unknown [10.163.28.18]) by relay2.suse.de (Postfix) with ESMTP id B13A6A3B83; Wed, 6 Oct 2021 17:31:57 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 6551A1F2C99; Wed, 6 Oct 2021 19:31:57 +0200 (CEST) From: Jan Kara To: Paolo Valente Cc: , Jens Axboe , =?utf-8?q?M?= =?utf-8?q?ichal_Koutn=C3=BD?= , Jan Kara Subject: [PATCH 3/8] bfq: Store full bitmap depth in bfq_data Date: Wed, 6 Oct 2021 19:31:42 +0200 Message-Id: <20211006173157.6906-3-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20211006164110.10817-1-jack@suse.cz> References: <20211006164110.10817-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2357; h=from:subject; bh=sNqsOH8z2zBVNQi55Ud9g7Q1mtyHGxzPOFAv9uD3Lg0=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBhXd1+yJSM7NGZ5fl1qc4ciNJExWF/Rma0bIO8Nfk/ LJ93naWJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYV3dfgAKCRCcnaoHP2RA2Q3bB/ 4rPNyR9++F1zwct/cWKAUmMIAwtGl4DjCoLAyoENLoo9c1X7YPkTIKTnRdVktbk22bmcunXN5y9Kr4 inuXMYA0PXtdUZ1+T+YEUpiNksE4esoQDpA8VQx4k3yZIqUxNlXv90hfw1N7NgEfJnerOcOQdX6YU1 NsnZVLK9owHzJHlt53cNXIBl2n622EVvzjkNn436JROjcqI3+wrkx8O6HblEtgTQoUd49U5PPJAFIV Ue6VmEtAYDFUKQI+LMUDLfaMoTJG7eDmGkZJCUbhY7PxmJbJqC3ykePZytdMzao8+vHs/Hqa2q1LJw G2Re6aBecfr1k6/61r4zfsUAfvx4fZ X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Store bitmap depth shift inside bfq_data so that we can use it in bfq_limit_depth() for proportioning when limiting number of available request tags for a cgroup. Signed-off-by: Jan Kara --- block/bfq-iosched.c | 10 ++++++---- block/bfq-iosched.h | 1 + 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 4d9e04edb614..c93a74b8e9c8 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -6855,7 +6855,9 @@ static unsigned int bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt) { unsigned int i, j, min_shallow = UINT_MAX; + unsigned int depth = 1U << bt->sb.shift; + bfqd->full_depth_shift = bt->sb.shift; /* * In-word depths if no bfq_queue is being weight-raised: * leaving 25% of tags only for sync reads. @@ -6867,13 +6869,13 @@ static unsigned int bfq_update_depths(struct bfq_data *bfqd, * limit 'something'. */ /* no more than 50% of tags for async I/O */ - bfqd->word_depths[0][0] = max((1U << bt->sb.shift) >> 1, 1U); + bfqd->word_depths[0][0] = max(depth >> 1, 1U); /* * no more than 75% of tags for sync writes (25% extra tags * w.r.t. async I/O, to prevent async I/O from starving sync * writes) */ - bfqd->word_depths[0][1] = max(((1U << bt->sb.shift) * 3) >> 2, 1U); + bfqd->word_depths[0][1] = max((depth * 3) >> 2, 1U); /* * In-word depths in case some bfq_queue is being weight- @@ -6883,9 +6885,9 @@ static unsigned int bfq_update_depths(struct bfq_data *bfqd, * shortage. */ /* no more than ~18% of tags for async I/O */ - bfqd->word_depths[1][0] = max(((1U << bt->sb.shift) * 3) >> 4, 1U); + bfqd->word_depths[1][0] = max((depth * 3) >> 4, 1U); /* no more than ~37% of tags for sync writes (~20% extra tags) */ - bfqd->word_depths[1][1] = max(((1U << bt->sb.shift) * 6) >> 4, 1U); + bfqd->word_depths[1][1] = max((depth * 6) >> 4, 1U); for (i = 0; i < 2; i++) for (j = 0; j < 2; j++) diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h index 3787cfb0febb..820cb8c2d1fe 100644 --- a/block/bfq-iosched.h +++ b/block/bfq-iosched.h @@ -769,6 +769,7 @@ struct bfq_data { * function) */ unsigned int word_depths[2][2]; + unsigned int full_depth_shift; }; enum bfqq_state_flags { From patchwork Wed Oct 6 17:31:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12539931 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26203C433FE for ; Wed, 6 Oct 2021 17:32:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0FDC1610C8 for ; Wed, 6 Oct 2021 17:32:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238661AbhJFRd4 (ORCPT ); Wed, 6 Oct 2021 13:33:56 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:50256 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238621AbhJFRdv (ORCPT ); Wed, 6 Oct 2021 13:33:51 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id D80A32256D; Wed, 6 Oct 2021 17:31:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1633541517; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3ff9bGsES0K5HxmIwfEBQB+XL5xmFpGuPrtyb0XbFi0=; b=SzIpBzkHD2gS8N42BBvWVmVpyq7a/1vJlVrwU/QPwXmbJomejM+TI4ExGDRDjnVKETPj5c Is1KrDzBOLQX/9K5x0nb37ZencehHiminA2GUIlHGAtZeq4XHYO+pQ6Usdedy8G4vZZ1wN dVnOkZa1CXaLniqseRGnjmPRGj67XWw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1633541517; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3ff9bGsES0K5HxmIwfEBQB+XL5xmFpGuPrtyb0XbFi0=; b=0eU1ax11+ZFUMu2PSFZXu++x13CupTOmcfA7CWgUjor8+BtXfO/cTgHSQpZCCO/fmBzFOP wXD9OqUypDe88FCQ== Received: from quack2.suse.cz (unknown [10.163.28.18]) by relay2.suse.de (Postfix) with ESMTP id BC75FA3B8D; Wed, 6 Oct 2021 17:31:57 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 697B61F2C9D; Wed, 6 Oct 2021 19:31:57 +0200 (CEST) From: Jan Kara To: Paolo Valente Cc: , Jens Axboe , =?utf-8?q?M?= =?utf-8?q?ichal_Koutn=C3=BD?= , Jan Kara Subject: [PATCH 4/8] bfq: Limit number of requests consumed by each cgroup Date: Wed, 6 Oct 2021 19:31:43 +0200 Message-Id: <20211006173157.6906-4-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20211006164110.10817-1-jack@suse.cz> References: <20211006164110.10817-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=7648; h=from:subject; bh=SFmISuKf6wRrmtTilrrCKWH39RhJQns5Of+bLRkfXOo=; b=owGbwMvMwME4Z+4qdvsUh5uMp9WSGBJj79YnS4dN1Fj0zVnnZThPol96a6aw8pu5nh2af7/UBi7O y+ztZDRmYWDkYJAVU2RZHXlR+9o8o66toRoyMINYmUCmMHBxCsBE4kU5GLoOPPExnjNxzrtlek2Noh yi1nm64jNFLrXs6Waz76rP9p+xTESyi7d6qyDL9ytHit2EPzmxvri/Kj/f2Wb5s9j01yxG8cprjFaw 7lsjeH/dtfrAj+c0c2as/ai/Ja1wouJinoyuyGmba11MmXg9Lhyu+lsuUy3FmxS8yC5z9oKqTpn4yZ UB/OecbRTtjXZVrBB/wuUTKJDCGLy93OZt0pv5O7YtuF4j6fb95KkF9zfyeLxIbfllsXrB8hY7i0en 3Jfpb3lpERN5eptCN6vp4j07N+fN+3t0/eoOl4p1dQyNHrZlr5nXX3X22hAy69fm67sK8/MUtZJ3G0 Qtqrxv6uX+IOyxlsutnH/Kx2WCAQ== X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org When cgroup IO scheduling is used with BFQ it does not really provide service differentiation if the cgroup drives a big IO depth. That for example happens with writeback which asynchronously submits lots of IO but it can happen with AIO as well. The problem is that if we have two cgroups that submit IO with different weights, the cgroup with higher weight properly gets more IO time and is able to dispatch more IO. However this causes lower weight cgroup to accumulate more requests inside BFQ and eventually lower weight cgroup consumes most of IO scheduler tags. At that point higher weight cgroup stops getting better service as it is mostly blocked waiting for a scheduler tag while its queues inside BFQ are empty and thus lower weight cgroup gets served. Check how many requests submitting cgroup has allocated in bfq_limit_depth() and if it consumes more requests than what would correspond to its weight limit available depth to 1 so that the cgroup cannot consume many more requests. With this limitation the higher weight cgroup gets proper service even with writeback. Signed-off-by: Jan Kara Reviewed-by: Michal Koutný --- block/bfq-iosched.c | 137 ++++++++++++++++++++++++++++++++++++++------ 1 file changed, 118 insertions(+), 19 deletions(-) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index c93a74b8e9c8..3806409610ca 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -565,26 +565,134 @@ static struct request *bfq_choose_req(struct bfq_data *bfqd, } } +#define BFQ_LIMIT_INLINE_DEPTH 16 + +#ifdef CONFIG_BFQ_GROUP_IOSCHED +static bool bfqq_request_over_limit(struct bfq_queue *bfqq, int limit) +{ + struct bfq_data *bfqd = bfqq->bfqd; + struct bfq_entity *entity = &bfqq->entity; + struct bfq_entity *inline_entities[BFQ_LIMIT_INLINE_DEPTH]; + struct bfq_entity **entities = inline_entities; + int depth, level; + int class_idx = bfqq->ioprio_class - 1; + struct bfq_sched_data *sched_data; + unsigned long wsum; + bool ret = false; + + if (!entity->on_st_or_in_serv) + return false; + + /* +1 for bfqq entity, root cgroup not included */ + depth = bfqg_to_blkg(bfqq_group(bfqq))->blkcg->css.cgroup->level + 1; + if (depth > BFQ_LIMIT_INLINE_DEPTH) { + entities = kmalloc_array(depth, sizeof(*entities), GFP_NOIO); + if (!entities) + return false; + } + + spin_lock_irq(&bfqd->lock); + sched_data = entity->sched_data; + /* Gather our ancestors as we need to traverse them in reverse order */ + level = 0; + for_each_entity(entity) { + /* + * If at some level entity is not even active, allow request + * queueing so that BFQ knows there's work to do and activate + * entities. + */ + if (!entity->on_st_or_in_serv) + goto out; + /* Uh, more parents than cgroup subsystem thinks? */ + if (WARN_ON_ONCE(level >= depth)) + break; + entities[level++] = entity; + } + WARN_ON_ONCE(level != depth); + for (level--; level >= 0; level--) { + entity = entities[level]; + if (level > 0) { + wsum = bfq_entity_service_tree(entity)->wsum; + } else { + int i; + /* + * For bfqq itself we take into account service trees + * of all higher priority classes and multiply their + * weights so that low prio queue from higher class + * gets more requests than high prio queue from lower + * class. + */ + wsum = 0; + for (i = 0; i <= class_idx; i++) { + wsum = wsum * IOPRIO_BE_NR + + sched_data->service_tree[i].wsum; + } + } + limit = DIV_ROUND_CLOSEST(limit * entity->weight, wsum); + if (entity->allocated >= limit) { + bfq_log_bfqq(bfqq->bfqd, bfqq, + "too many requests: allocated %d limit %d level %d", + entity->allocated, limit, level); + ret = true; + break; + } + } +out: + spin_unlock_irq(&bfqd->lock); + if (entities != inline_entities) + kfree(entities); + return ret; +} +#else +static bool bfqq_request_over_limit(struct bfq_queue *bfqq, int limit) +{ + return false; +} +#endif + /* * Async I/O can easily starve sync I/O (both sync reads and sync * writes), by consuming all tags. Similarly, storms of sync writes, * such as those that sync(2) may trigger, can starve sync reads. * Limit depths of async I/O and sync writes so as to counter both * problems. + * + * Also if a bfq queue or its parent cgroup consume more tags than would be + * appropriate for their weight, we trim the available tag depth to 1. This + * avoids a situation where one cgroup can starve another cgroup from tags and + * thus block service differentiation among cgroups. Note that because the + * queue / cgroup already has many requests allocated and queued, this does not + * significantly affect service guarantees coming from the BFQ scheduling + * algorithm. */ static void bfq_limit_depth(unsigned int op, struct blk_mq_alloc_data *data) { struct bfq_data *bfqd = data->q->elevator->elevator_data; + struct bfq_io_cq *bic = data->icq ? icq_to_bic(data->icq) : NULL; + struct bfq_queue *bfqq = bic ? bic_to_bfqq(bic, op_is_sync(op)) : NULL; + int depth; + unsigned limit = data->q->nr_requests; + + /* Sync reads have full depth available */ + if (op_is_sync(op) && !op_is_write(op)) { + depth = 0; + } else { + depth = bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)]; + limit = (limit * depth) >> bfqd->full_depth_shift; + } - if (op_is_sync(op) && !op_is_write(op)) - return; - - data->shallow_depth = - bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)]; + /* + * Does queue (or any parent entity) exceed number of requests that + * should be available to it? Heavily limit depth so that it cannot + * consume more available requests and thus starve other entities. + */ + if (bfqq && bfqq_request_over_limit(bfqq, limit)) + depth = 1; bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u", - __func__, bfqd->wr_busy_queues, op_is_sync(op), - data->shallow_depth); + __func__, bfqd->wr_busy_queues, op_is_sync(op), depth); + if (depth) + data->shallow_depth = depth; } static struct bfq_queue * @@ -6851,10 +6959,8 @@ void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg) * See the comments on bfq_limit_depth for the purpose of * the depths set in the function. Return minimum shallow depth we'll use. */ -static unsigned int bfq_update_depths(struct bfq_data *bfqd, - struct sbitmap_queue *bt) +static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt) { - unsigned int i, j, min_shallow = UINT_MAX; unsigned int depth = 1U << bt->sb.shift; bfqd->full_depth_shift = bt->sb.shift; @@ -6888,22 +6994,15 @@ static unsigned int bfq_update_depths(struct bfq_data *bfqd, bfqd->word_depths[1][0] = max((depth * 3) >> 4, 1U); /* no more than ~37% of tags for sync writes (~20% extra tags) */ bfqd->word_depths[1][1] = max((depth * 6) >> 4, 1U); - - for (i = 0; i < 2; i++) - for (j = 0; j < 2; j++) - min_shallow = min(min_shallow, bfqd->word_depths[i][j]); - - return min_shallow; } static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx) { struct bfq_data *bfqd = hctx->queue->elevator->elevator_data; struct blk_mq_tags *tags = hctx->sched_tags; - unsigned int min_shallow; - min_shallow = bfq_update_depths(bfqd, tags->bitmap_tags); - sbitmap_queue_min_shallow_depth(tags->bitmap_tags, min_shallow); + bfq_update_depths(bfqd, tags->bitmap_tags); + sbitmap_queue_min_shallow_depth(tags->bitmap_tags, 1); } static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index) From patchwork Wed Oct 6 17:31:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12539929 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85833C4332F for ; Wed, 6 Oct 2021 17:32:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6E6A0610C8 for ; Wed, 6 Oct 2021 17:32:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232082AbhJFRd4 (ORCPT ); Wed, 6 Oct 2021 13:33:56 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:50268 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238661AbhJFRdv (ORCPT ); Wed, 6 Oct 2021 13:33:51 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 089932256E; Wed, 6 Oct 2021 17:31:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1633541518; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=S1+iBGtWdd4cv2WUeSuSUAx2YoUEbTzBZFGF9Ks1J0c=; b=znkHKpBtbOOhl4kGLL35Hk/IvLFpYyuO8oit1nANxBhBmIWNm9kynCtAaXdZZO16NoOenr 8fAtG+zed0fMwoVHLjgfyg62LbLYcCSBK5SJ7xSbrvPk1rwtrlPPO05aUPHh5kUrx+dkqS +1Vqjg/sCINBbO1E8WoZCdqa3abiJUk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1633541518; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=S1+iBGtWdd4cv2WUeSuSUAx2YoUEbTzBZFGF9Ks1J0c=; b=7Jl67OR3UPd0Z76rtVP6AiUTd3RArXTZAwv97UBzRO7rXKTcCzFuvmEDzL5TuxktPRRIfS jxGVZahIA4lFOTAg== Received: from quack2.suse.cz (unknown [10.163.28.18]) by relay2.suse.de (Postfix) with ESMTP id E8F5FA3B8F; Wed, 6 Oct 2021 17:31:57 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 6E0AC1F2CA2; Wed, 6 Oct 2021 19:31:57 +0200 (CEST) From: Jan Kara To: Paolo Valente Cc: , Jens Axboe , =?utf-8?q?M?= =?utf-8?q?ichal_Koutn=C3=BD?= , Jan Kara Subject: [PATCH 5/8] bfq: Limit waker detection in time Date: Wed, 6 Oct 2021 19:31:44 +0200 Message-Id: <20211006173157.6906-5-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20211006164110.10817-1-jack@suse.cz> References: <20211006164110.10817-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=5540; h=from:subject; bh=0R6Ye5qtwSa+uxam/6Tko8iyfbFlRl6KbpDDzt7xH5w=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBhXd2A4zUhdC9dUSqYN1CT3H7+Op1+ZUt9mR7ISjmt cMs8wUKJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYV3dgAAKCRCcnaoHP2RA2c4HB/ 978x9IbeatAQaiGo+hgsU1+PLN5p8EaeJwe4NJgnQIlFa3WehAZZRGKWbWY8s0TBHsR7lQ0MlFu6zm J7CI6h0mW6ND82kD97xy3StNWmHhy6bxISRGTUPZVAfa14fk1u32bCwxHssywITa0fwW6QPVPIUEDC grJQb4vgANx2dYMWTWhiI5gfqVK82FV+OSwHk8gg3vfwanpQnS7l5RJazcdbpkBezr0gn49BzY3SX4 PJXJC1beQcF24/HAo1NtOgVdVBQx1XwQHkFg1FUNUGl8pFqE+AqLp/qau1foC0L5Wpst94g0PYhxTe Pg8GMrIvX0Xwd7vdTMB3t8GPYB3IaL X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Currently, when process A starts issuing requests shortly after process B has completed some IO three times in a row, we decide that B is a "waker" of A meaning that completing IO of B is needed for A to make progress and generally stop separating A's and B's IO much. This logic is useful to avoid unnecessary idling and thus throughput loss for cases where workload needs to switch e.g. between the process and the journaling thread doing IO. However the detection heuristic tends to frequently give false positives when A and B are fighting IO bandwidth and other processes aren't doing much IO as we are basically deemed to eventually accumulate three occurences of a situation where one process starts issuing requests after the other has completed some IO. To reduce these false positives, cancel the waker detection also if we didn't accumulate three detected wakeups within given timeout. The rationale is that if wakeups are really rare, the pointless idling doesn't hurt throughput that much anyway. This significantly reduces false waker detection for workload like: [global] directory=/mnt/repro/ rw=write size=8g time_based runtime=30 ramp_time=10 blocksize=1m direct=0 ioengine=sync [slowwriter] numjobs=1 fsync=200 [fastwriter] numjobs=1 fsync=200 Signed-off-by: Jan Kara --- block/bfq-iosched.c | 38 +++++++++++++++++++++++--------------- block/bfq-iosched.h | 2 ++ 2 files changed, 25 insertions(+), 15 deletions(-) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 3806409610ca..6c5e9bafdb5d 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -2091,20 +2091,19 @@ static void bfq_update_io_intensity(struct bfq_queue *bfqq, u64 now_ns) * aspect, see the comments on the choice of the queue for injection * in bfq_select_queue(). * - * Turning back to the detection of a waker queue, a queue Q is deemed - * as a waker queue for bfqq if, for three consecutive times, bfqq - * happens to become non empty right after a request of Q has been - * completed. In this respect, even if bfqq is empty, we do not check - * for a waker if it still has some in-flight I/O. In fact, in this - * case bfqq is actually still being served by the drive, and may - * receive new I/O on the completion of some of the in-flight - * requests. In particular, on the first time, Q is tentatively set as - * a candidate waker queue, while on the third consecutive time that Q - * is detected, the field waker_bfqq is set to Q, to confirm that Q is - * a waker queue for bfqq. These detection steps are performed only if - * bfqq has a long think time, so as to make it more likely that - * bfqq's I/O is actually being blocked by a synchronization. This - * last filter, plus the above three-times requirement, make false + * Turning back to the detection of a waker queue, a queue Q is deemed as a + * waker queue for bfqq if, for three consecutive times, bfqq happens to become + * non empty right after a request of Q has been completed within given + * timeout. In this respect, even if bfqq is empty, we do not check for a waker + * if it still has some in-flight I/O. In fact, in this case bfqq is actually + * still being served by the drive, and may receive new I/O on the completion + * of some of the in-flight requests. In particular, on the first time, Q is + * tentatively set as a candidate waker queue, while on the third consecutive + * time that Q is detected, the field waker_bfqq is set to Q, to confirm that Q + * is a waker queue for bfqq. These detection steps are performed only if bfqq + * has a long think time, so as to make it more likely that bfqq's I/O is + * actually being blocked by a synchronization. This last filter, plus the + * above three-times requirement and time limit for detection, make false * positives less likely. * * NOTE @@ -2136,8 +2135,16 @@ static void bfq_check_waker(struct bfq_data *bfqd, struct bfq_queue *bfqq, bfqd->last_completed_rq_bfqq == bfqq->waker_bfqq) return; + /* + * We reset waker detection logic also if too much time has passed + * since the first detection. If wakeups are rare, pointless idling + * doesn't hurt throughput that much. The condition below makes sure + * we do not uselessly idle blocking waker in more than 1/64 cases. + */ if (bfqd->last_completed_rq_bfqq != - bfqq->tentative_waker_bfqq) { + bfqq->tentative_waker_bfqq || + now_ns > bfqq->waker_detection_started + + 128 * (u64)bfqd->bfq_slice_idle) { /* * First synchronization detected with a * candidate waker queue, or with a different @@ -2146,6 +2153,7 @@ static void bfq_check_waker(struct bfq_data *bfqd, struct bfq_queue *bfqq, bfqq->tentative_waker_bfqq = bfqd->last_completed_rq_bfqq; bfqq->num_waker_detections = 1; + bfqq->waker_detection_started = now_ns; } else /* Same tentative waker queue detected again */ bfqq->num_waker_detections++; diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h index 820cb8c2d1fe..bb8180c52a31 100644 --- a/block/bfq-iosched.h +++ b/block/bfq-iosched.h @@ -388,6 +388,8 @@ struct bfq_queue { struct bfq_queue *tentative_waker_bfqq; /* number of times the same tentative waker has been detected */ unsigned int num_waker_detections; + /* time when we started considering this waker */ + u64 waker_detection_started; /* node for woken_list, see below */ struct hlist_node woken_list_node; From patchwork Wed Oct 6 17:31:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12539939 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D420C433FE for ; Wed, 6 Oct 2021 17:32:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EDAC3610C8 for ; Wed, 6 Oct 2021 17:32:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239064AbhJFRd7 (ORCPT ); Wed, 6 Oct 2021 13:33:59 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:41990 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238724AbhJFRdv (ORCPT ); Wed, 6 Oct 2021 13:33:51 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 090621FEF3; Wed, 6 Oct 2021 17:31:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1633541518; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3wk6tCskQ9yIfiKGjM5o2XAB1G+7SzmoK7GhWE/e1TY=; b=NkKmRO0Dk1sh2DWAUK7bXabDkXDp9S1Q4hdQ9g1SOCStcyocl6+EDZHCohcuK/6liBpUQJ kLq0IMvm/wp6rJD6ALoROfu4TF8zH8mTVJUpItMGUHfO+o6OlMYoFXNSxJpPKZVtTxUXmI M6biBLADwm0qduC8agoQf+/Y1nVNV3U= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1633541518; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3wk6tCskQ9yIfiKGjM5o2XAB1G+7SzmoK7GhWE/e1TY=; b=fAXG8LSsn8b9Yv4xIlmjA6lnlyRG76kUw0LC/ZfjMpGyLQHWQGePvJQ7F4a1DfWv0kczfz +tdO1Dnr6AD4IKAA== Received: from quack2.suse.cz (unknown [10.163.28.18]) by relay2.suse.de (Postfix) with ESMTP id E7D7DA3B8E; Wed, 6 Oct 2021 17:31:57 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 716431F2CA3; Wed, 6 Oct 2021 19:31:57 +0200 (CEST) From: Jan Kara To: Paolo Valente Cc: , Jens Axboe , =?utf-8?q?M?= =?utf-8?q?ichal_Koutn=C3=BD?= , Jan Kara Subject: [PATCH 6/8] bfq: Provide helper to generate bfqq name Date: Wed, 6 Oct 2021 19:31:45 +0200 Message-Id: <20211006173157.6906-6-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20211006164110.10817-1-jack@suse.cz> References: <20211006164110.10817-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2855; h=from:subject; bh=RkAzfQOxfcGjo5Yxwc+JRb2ioBSMxMmjz16n6Ifdwe0=; b=owGbwMvMwME4Z+4qdvsUh5uMp9WSGBJj7zbo/TTVadDXFmKfEnR8u4VLmNKb2u45NgnJFmVrD3dU /AjrZDRmYWDkYJAVU2RZHXlR+9o8o66toRoyMINYmUCmMHBxCsCNbuVgWGI0+6pbegnPBCnzKSsfdB nLlLN4rqzn7mmRmbhB4khMf/IriYuWSntXBei9Z3aUzN93bPL9TYbvvrIZpc217D8qfXyHJpvvD+Pu lJcsy1csPyu4syda49WxIm+bHweEMs6YRFTL3Zma9yvC/vnTHakPNkhYuritkVJZ9kTpmeC8G2seb+ 2zstfKE/W0l1SZMm9a4Q7eJIltfe8jSyOyfBSZ4sR3VG9t8zLId74XqlrWetbGvVzuv8vHV9rZL7pX LZvlHfy6sOZfD6+yd6PU3Z5vjtEN18JsnpjsfSH0c/u8+5pnHHlZZM51reI/f0Vt2z7vOtY5tzW26/ pO8e0VMg/jOLzB5LXbe9YpcU8B X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Instead of having helper formating bfqq pid, provide a helper to generate full bfqq name as used in the traces. It saves some code duplication and will save more in the coming tracepoints. Signed-off-by: Jan Kara --- block/bfq-iosched.h | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h index bb8180c52a31..07288b9da389 100644 --- a/block/bfq-iosched.h +++ b/block/bfq-iosched.h @@ -25,7 +25,7 @@ #define BFQ_DEFAULT_GRP_IOPRIO 0 #define BFQ_DEFAULT_GRP_CLASS IOPRIO_CLASS_BE -#define MAX_PID_STR_LENGTH 12 +#define MAX_BFQQ_NAME_LENGTH 16 /* * Soft real-time applications are extremely more latency sensitive @@ -1083,26 +1083,27 @@ void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq); /* --------------- end of interface of B-WF2Q+ ---------------- */ /* Logging facilities. */ -static inline void bfq_pid_to_str(int pid, char *str, int len) +static inline void bfq_bfqq_name(struct bfq_queue *bfqq, char *str, int len) { - if (pid != -1) - snprintf(str, len, "%d", pid); + char type = bfq_bfqq_sync(bfqq) ? 'S' : 'A'; + + if (bfqq->pid != -1) + snprintf(str, len, "bfq%d%c", bfqq->pid, type); else - snprintf(str, len, "SHARED-"); + snprintf(str, len, "bfqSHARED-%c", type); } #ifdef CONFIG_BFQ_GROUP_IOSCHED struct bfq_group *bfqq_group(struct bfq_queue *bfqq); #define bfq_log_bfqq(bfqd, bfqq, fmt, args...) do { \ - char pid_str[MAX_PID_STR_LENGTH]; \ + char pid_str[MAX_BFQQ_NAME_LENGTH]; \ if (likely(!blk_trace_note_message_enabled((bfqd)->queue))) \ break; \ - bfq_pid_to_str((bfqq)->pid, pid_str, MAX_PID_STR_LENGTH); \ + bfq_bfqq_name((bfqq), pid_str, MAX_BFQQ_NAME_LENGTH); \ blk_add_cgroup_trace_msg((bfqd)->queue, \ bfqg_to_blkg(bfqq_group(bfqq))->blkcg, \ - "bfq%s%c " fmt, pid_str, \ - bfq_bfqq_sync((bfqq)) ? 'S' : 'A', ##args); \ + "%s " fmt, pid_str, ##args); \ } while (0) #define bfq_log_bfqg(bfqd, bfqg, fmt, args...) do { \ @@ -1113,13 +1114,11 @@ struct bfq_group *bfqq_group(struct bfq_queue *bfqq); #else /* CONFIG_BFQ_GROUP_IOSCHED */ #define bfq_log_bfqq(bfqd, bfqq, fmt, args...) do { \ - char pid_str[MAX_PID_STR_LENGTH]; \ + char pid_str[MAX_BFQQ_NAME_LENGTH]; \ if (likely(!blk_trace_note_message_enabled((bfqd)->queue))) \ break; \ - bfq_pid_to_str((bfqq)->pid, pid_str, MAX_PID_STR_LENGTH); \ - blk_add_trace_msg((bfqd)->queue, "bfq%s%c " fmt, pid_str, \ - bfq_bfqq_sync((bfqq)) ? 'S' : 'A', \ - ##args); \ + bfq_bfqq_name((bfqq), pid_str, MAX_BFQQ_NAME_LENGTH); \ + blk_add_trace_msg((bfqd)->queue, "%s " fmt, pid_str, ##args); \ } while (0) #define bfq_log_bfqg(bfqd, bfqg, fmt, args...) do {} while (0) From patchwork Wed Oct 6 17:31:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12539937 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22186C4321E for ; Wed, 6 Oct 2021 17:32:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0E5A0610A0 for ; Wed, 6 Oct 2021 17:32:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239019AbhJFRd6 (ORCPT ); Wed, 6 Oct 2021 13:33:58 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:41974 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238664AbhJFRdv (ORCPT ); Wed, 6 Oct 2021 13:33:51 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 08DEC1FEEF; Wed, 6 Oct 2021 17:31:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1633541518; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KII/Hs2+ZKgN7DHZXgBjp+H0qWndFcwudaWYnhaH3NM=; b=C2JvP5XIaNPWg5RA+qt8gjwCuS8MttojqjidKt+PaYnzvNPd2/CBl5owVG8mLbDzvYt1kj fZ7CK3HN+ON3kltxoqJ9tgMn0jSXdGjmjYD0dJSWZN7XjIFQsEadGOKQL+VL1ROGk8vtSK +x5YbkmodVfcofhmNTGjMEvOMoEeEzQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1633541518; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KII/Hs2+ZKgN7DHZXgBjp+H0qWndFcwudaWYnhaH3NM=; b=b6i+obU2GH+0n5FUC5G1n6cogaskYUW0ZkSYyyJn5xkDMy3oAjUK2TPuy8x3ssPB8jE4hN dRXvSXSrOmh8bsDA== Received: from quack2.suse.cz (unknown [10.163.28.18]) by relay2.suse.de (Postfix) with ESMTP id E9001A3B90; Wed, 6 Oct 2021 17:31:57 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 74E881F2CA5; Wed, 6 Oct 2021 19:31:57 +0200 (CEST) From: Jan Kara To: Paolo Valente Cc: , Jens Axboe , =?utf-8?q?M?= =?utf-8?q?ichal_Koutn=C3=BD?= , Jan Kara Subject: [PATCH 7/8] bfq: Log waker detections Date: Wed, 6 Oct 2021 19:31:46 +0200 Message-Id: <20211006173157.6906-7-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20211006164110.10817-1-jack@suse.cz> References: <20211006164110.10817-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1683; h=from:subject; bh=Zv1+Q1Zk8ItMBQD5wOXUUOnHDT2JltHFWFqe9XZ8yBs=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBhXd2B+QD2RFEJhUdcY7q++r8M7Ya1LTBF7Vx/aKPK qmds7nmJATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYV3dgQAKCRCcnaoHP2RA2dTCB/ 9nesAouv+assQ9AoAOVqRMsq8BVmIdJPQFGYKbRsBMCCOiQQUb3+E8NpgIJhmul8uUESnaNkpEBxe4 987AMliHywu3Yda3lnklW1o2Y6SP89c63k+sjfw+Ge8LZj0hjJ2ngWHZWsXcA0mVtPlm58kW3udVOj qaeiOwZOUesl/2iifNPwz1VJEWZ5dWtFLtUeORnqOa4uoZHSfC/i1j1Lwk0QlIpALYnty6yOU1F9sK +I4Ubw2SpPi0WUGfXJMHV4qVFwAfGVirzRTj+DxnBGVrJcAjrVSyqHRy6rxPhVDZbVeZ2CoXra9LV1 uSZMVRYxaNNbnQjKbt91V+sBq7WDah X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Waker - wakee relationships are important in deciding whether one queue can preempt the other one. Print information about detected waker-wakee relationships so that scheduling decisions can be better understood from block traces. Signed-off-by: Jan Kara --- block/bfq-iosched.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 6c5e9bafdb5d..886befc35b57 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -2127,6 +2127,8 @@ static void bfq_update_io_intensity(struct bfq_queue *bfqq, u64 now_ns) static void bfq_check_waker(struct bfq_data *bfqd, struct bfq_queue *bfqq, u64 now_ns) { + char waker_name[MAX_BFQQ_NAME_LENGTH]; + if (!bfqd->last_completed_rq_bfqq || bfqd->last_completed_rq_bfqq == bfqq || bfq_bfqq_has_short_ttime(bfqq) || @@ -2154,12 +2156,18 @@ static void bfq_check_waker(struct bfq_data *bfqd, struct bfq_queue *bfqq, bfqd->last_completed_rq_bfqq; bfqq->num_waker_detections = 1; bfqq->waker_detection_started = now_ns; + bfq_bfqq_name(bfqq->tentative_waker_bfqq, waker_name, + MAX_BFQQ_NAME_LENGTH); + bfq_log_bfqq(bfqd, bfqq, "set tenative waker %s", waker_name); } else /* Same tentative waker queue detected again */ bfqq->num_waker_detections++; if (bfqq->num_waker_detections == 3) { bfqq->waker_bfqq = bfqd->last_completed_rq_bfqq; bfqq->tentative_waker_bfqq = NULL; + bfq_bfqq_name(bfqq->waker_bfqq, waker_name, + MAX_BFQQ_NAME_LENGTH); + bfq_log_bfqq(bfqd, bfqq, "set waker %s", waker_name); /* * If the waker queue disappears, then From patchwork Wed Oct 6 17:31:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 12539941 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D5EBC433EF for ; Wed, 6 Oct 2021 17:32:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 37358610CC for ; Wed, 6 Oct 2021 17:32:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238724AbhJFReA (ORCPT ); Wed, 6 Oct 2021 13:34:00 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:41998 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238764AbhJFRdv (ORCPT ); Wed, 6 Oct 2021 13:33:51 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 0C80B2006A; Wed, 6 Oct 2021 17:31:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1633541518; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=E/cdzGWpVIi/1lQxdCU4YeORCPw1dAioHvfRnoNLvf0=; b=0x9ne/1J4UprHY8IHdP2CokYpftFFyo2CjfkqwS26YdRaMKCWyHOTyX84nftp0GO8RqRK9 v4vwxa3xpOWy8e6sb89QBX2OtzcSfM+qbi/4a505rEkKIXL5dGQoy4UAXJaRA5V7uaNgTE hbgwDwWHy0GMTc6InOFq0+KNyVLbew0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1633541518; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=E/cdzGWpVIi/1lQxdCU4YeORCPw1dAioHvfRnoNLvf0=; b=RrD4ABMCXNsBItRip3TaFyFzX+XuJGg4Rbv8xU/KRI8natIoocoJDAZPXGNRJaKc4CQp/5 HWtlxJOmyoWD9kDw== Received: from quack2.suse.cz (unknown [10.163.28.18]) by relay2.suse.de (Postfix) with ESMTP id E91FDA3B91; Wed, 6 Oct 2021 17:31:57 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 782291F2CA8; Wed, 6 Oct 2021 19:31:57 +0200 (CEST) From: Jan Kara To: Paolo Valente Cc: , Jens Axboe , =?utf-8?q?M?= =?utf-8?q?ichal_Koutn=C3=BD?= , Jan Kara Subject: [PATCH 8/8] bfq: Do not let waker requests skip proper accounting Date: Wed, 6 Oct 2021 19:31:47 +0200 Message-Id: <20211006173157.6906-8-jack@suse.cz> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20211006164110.10817-1-jack@suse.cz> References: <20211006164110.10817-1-jack@suse.cz> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=5049; h=from:subject; bh=JvltDTlTsWoqMQ81Lk3WlFrszv2LKI4OA/zKkQgyAs8=; b=owEBbQGS/pANAwAIAZydqgc/ZEDZAcsmYgBhXd2C9iUzzdgguU1Fwl4Ti/1wE0vJWWwMVIQ7KpIh vioEWF2JATMEAAEIAB0WIQSrWdEr1p4yirVVKBycnaoHP2RA2QUCYV3dggAKCRCcnaoHP2RA2XVaCA CwjVfttwuL5/siGYZePU412Z97Q6odxlbHXkRdrRpmnBZU506rcDZbT+Xq0F4Lb9PMAXHNuyNITS8x YUuHgxZXGOBg0d+YIi+PYnt8N+ZQiaHs96NgMr+f5fUcr438LqdgoP33tVWVWz99pxySoBQu4KPbLe Qxzlt98JHCQGy8kBeCFs1CXlDm/muTHE7x1oX0Zmvf/UjvCi08i5XwhZNVPd6dJrvJ9/Vh/+1X9NWg 5D0q3C3UDhJ05REyTSCaqodn5aTNFKtdllgrEeTVz09KiQXdPTV8Qfoe6PoJHNWmxHQmih507hW/LS zjSsf+WvTAJ7dJcChf9zLr9zferMPS X-Developer-Key: i=jack@suse.cz; a=openpgp; fpr=93C6099A142276A28BBE35D815BC833443038D8C Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Commit 7cc4ffc55564 ("block, bfq: put reqs of waker and woken in dispatch list") added a condition to bfq_insert_request() which added waker's requests directly to dispatch list. The rationale was that completing waker's IO is needed to get more IO for the current queue. Although this rationale is valid, there is a hole in it. The waker does not necessarily serve the IO only for the current queue and maybe it's current IO is not needed for current queue to make progress. Furthermore injecting IO like this completely bypasses any service accounting within bfq and thus we do not properly track how much service is waker's queue getting or that the waker is actually doing any IO. Depending on the conditions this can result in the waker getting too much or too few service. Consider for example the following job file: [global] directory=/mnt/repro/ rw=write size=8g time_based runtime=30 ramp_time=10 blocksize=1m direct=0 ioengine=sync [slowwriter] numjobs=1 prioclass=2 prio=7 fsync=200 [fastwriter] numjobs=1 prioclass=2 prio=0 fsync=200 Despite processes have very different IO priorities, they get the same about of service. The reason is that bfq identifies these processes as having waker-wakee relationship and once that happens, IO from fastwriter gets injected during slowwriter's time slice. As a result bfq is not aware that fastwriter has any IO to do and constantly schedules only slowwriter's queue. Thus fastwriter is forced to compete with slowwriter's IO all the time instead of getting its share of time based on IO priority. Drop the special injection condition from bfq_insert_request(). As a result, requests will be tracked and queued in a normal way and on next dispatch bfq_select_queue() can decide whether the waker's inserted requests should be injected during the current queue's timeslice or not. Fixes: 7cc4ffc55564 ("block, bfq: put reqs of waker and woken in dispatch list") Signed-off-by: Jan Kara --- block/bfq-iosched.c | 44 +------------------------------------------- 1 file changed, 1 insertion(+), 43 deletions(-) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 886befc35b57..803a0c313f0f 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -6132,48 +6132,7 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, spin_lock_irq(&bfqd->lock); bfqq = bfq_init_rq(rq); - - /* - * Reqs with at_head or passthrough flags set are to be put - * directly into dispatch list. Additional case for putting rq - * directly into the dispatch queue: the only active - * bfq_queues are bfqq and either its waker bfq_queue or one - * of its woken bfq_queues. The rationale behind this - * additional condition is as follows: - * - consider a bfq_queue, say Q1, detected as a waker of - * another bfq_queue, say Q2 - * - by definition of a waker, Q1 blocks the I/O of Q2, i.e., - * some I/O of Q1 needs to be completed for new I/O of Q2 - * to arrive. A notable example of waker is journald - * - so, Q1 and Q2 are in any respect the queues of two - * cooperating processes (or of two cooperating sets of - * processes): the goal of Q1's I/O is doing what needs to - * be done so that new Q2's I/O can finally be - * issued. Therefore, if the service of Q1's I/O is delayed, - * then Q2's I/O is delayed too. Conversely, if Q2's I/O is - * delayed, the goal of Q1's I/O is hindered. - * - as a consequence, if some I/O of Q1/Q2 arrives while - * Q2/Q1 is the only queue in service, there is absolutely - * no point in delaying the service of such an I/O. The - * only possible result is a throughput loss - * - so, when the above condition holds, the best option is to - * have the new I/O dispatched as soon as possible - * - the most effective and efficient way to attain the above - * goal is to put the new I/O directly in the dispatch - * list - * - as an additional restriction, Q1 and Q2 must be the only - * busy queues for this commit to put the I/O of Q2/Q1 in - * the dispatch list. This is necessary, because, if also - * other queues are waiting for service, then putting new - * I/O directly in the dispatch list may evidently cause a - * violation of service guarantees for the other queues - */ - if (!bfqq || - (bfqq != bfqd->in_service_queue && - bfqd->in_service_queue != NULL && - bfq_tot_busy_queues(bfqd) == 1 + bfq_bfqq_busy(bfqq) && - (bfqq->waker_bfqq == bfqd->in_service_queue || - bfqd->in_service_queue->waker_bfqq == bfqq)) || at_head) { + if (!bfqq || at_head) { if (at_head) list_add(&rq->queuelist, &bfqd->dispatch); else @@ -6200,7 +6159,6 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, * merge). */ cmd_flags = rq->cmd_flags; - spin_unlock_irq(&bfqd->lock); bfq_update_insert_stats(q, bfqq, idle_timer_disabled,