From patchwork Thu Mar  4 17:46:22 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Paolo Valente <paolo.valente@linaro.org>
X-Patchwork-Id: 12116635
Return-Path: <linux-block-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,
	INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,
	USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C75B3C43381
	for <linux-block@archiver.kernel.org>; Thu,  4 Mar 2021 17:47:39 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 9D72D64F4C
	for <linux-block@archiver.kernel.org>; Thu,  4 Mar 2021 17:47:39 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S239875AbhCDRqz (ORCPT <rfc822;linux-block@archiver.kernel.org>);
        Thu, 4 Mar 2021 12:46:55 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48110 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S239480AbhCDRqL (ORCPT
        <rfc822;linux-block@vger.kernel.org>); Thu, 4 Mar 2021 12:46:11 -0500
Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com
 [IPv6:2a00:1450:4864:20::334])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C4B8C0613DD
        for <linux-block@vger.kernel.org>;
 Thu,  4 Mar 2021 09:45:27 -0800 (PST)
Received: by mail-wm1-x334.google.com with SMTP id m7so2177135wmq.0
        for <linux-block@vger.kernel.org>;
 Thu, 04 Mar 2021 09:45:27 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google;
        h=from:to:cc:subject:date:message-id:in-reply-to:references
         :mime-version:content-transfer-encoding;
        bh=kPe9/bC+fapDM9TyJPKrxL534jOiWcGHIwwJRVky7EU=;
        b=VJr7P5uBLKuYY1rMRD3JFi+li8ZpAK972sPQIW7fAMqv4MBC5pLyZV4PEPH7KNTFd/
         P8h2sFnz8YRjppTiHtUhoUDKq8mdCwkfJnoXoxmEGGzE9jZOYywEWkHrQAMnTxE9IMNo
         LL5/2jJxWH0u/sdzVdndhW9UeCKSXDen7KuiTWGLZ7nn8dLOjm002rgO3AZLVevCBsud
         gc76V9VWspuARIl3tfAM/NoQ8/xz73xvoeHyA8hURhHU51i2Jnzp23Iw0xSlHODilsyi
         rwgU5INYVb7xAXsRm4cbFByt/57YLxHq5V7BGEUFYfVAPHB+MTU9dojAYnbqy9TZKeGQ
         /WSw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
         :references:mime-version:content-transfer-encoding;
        bh=kPe9/bC+fapDM9TyJPKrxL534jOiWcGHIwwJRVky7EU=;
        b=Lof6L9wemEH7oNhk6sd6+Qk+4tfnKotvc27LqRtP8RVo/bYl/LPcdtT2QuwdujGTcm
         HTE6vJ5FewC5hV8q4oMxf0VnuSH6HfUU1HvBj2xCIoO13SA0qTLpRk7KnOOSYMkJw6Pq
         UMr0QCBItSvtq7bFSn6DyCP/cjdHi6zl8pQAyhUEHB9Q9wcwdzItZTnJYvYyA8mJSaFM
         Xa5vU4azx90z8mYt3ji00lnaVLaSExBdz6sJ7Vw48mVj8Q5PSq7g+ldbUy8WTWHsou1g
         hDlci40HuITdB8SSCihXwzSYWe1rhjG93TY2yV38ojENxJYSWWJQtrayneSzKWzfc1zM
         CErw==
X-Gm-Message-State: AOAM531YrFJqBkGNmOzmEa4cwC8soLVb+jfW+kSQBMchyC85askcb8Fb
        QBqcvWmFLNqGUSUJGOaU2TmCVw==
X-Google-Smtp-Source: 
 ABdhPJzwvhB5Zw+ULJWGpHtkDNXZUeVZWjNivjUYgxFJC35EZUg0yBfGSvjjkNlV7bsYqEJNzd13cg==
X-Received: by 2002:a1c:f312:: with SMTP id q18mr5012386wmq.79.1614879926349;
        Thu, 04 Mar 2021 09:45:26 -0800 (PST)
Received: from localhost.localdomain ([83.216.184.132])
        by smtp.gmail.com with ESMTPSA id a21sm271023wmb.5.2021.03.04.09.45.25
        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
        Thu, 04 Mar 2021 09:45:25 -0800 (PST)
From: Paolo Valente <paolo.valente@linaro.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
        Paolo Valente <paolo.valente@linaro.org>,
        Jan Kara <jack@suse.cz>
Subject: [PATCH BUGFIX/IMPROVEMENT V2 1/6] block,
 bfq: always inject I/O of queues blocked by wakers
Date: Thu,  4 Mar 2021 18:46:22 +0100
Message-Id: <20210304174627.161-2-paolo.valente@linaro.org>
X-Mailer: git-send-email 2.20.1
In-Reply-To: <20210304174627.161-1-paolo.valente@linaro.org>
References: <20210304174627.161-1-paolo.valente@linaro.org>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org

Suppose that I/O dispatch is plugged, to wait for new I/O for the
in-service bfq-queue, say bfqq.  Suppose then that there is a further
bfq_queue woken by bfqq, and that this woken queue has pending I/O. A
woken queue does not steal bandwidth from bfqq, because it remains
soon without I/O if bfqq is not served. So there is virtually no risk
of loss of bandwidth for bfqq if this woken queue has I/O dispatched
while bfqq is waiting for new I/O. In contrast, this extra I/O
injection boosts throughput. This commit performs this extra
injection.

Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 32 +++++++++++++++++++++++++++-----
 block/bfq-wf2q.c    |  8 ++++++++
 2 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 445cef9c0bb9..a83149407336 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4487,9 +4487,15 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
 			bfq_bfqq_busy(bfqq->bic->bfqq[0]) &&
 			bfqq->bic->bfqq[0]->next_rq ?
 			bfqq->bic->bfqq[0] : NULL;
+		struct bfq_queue *blocked_bfqq =
+			!hlist_empty(&bfqq->woken_list) ?
+			container_of(bfqq->woken_list.first,
+				     struct bfq_queue,
+				     woken_list_node)
+			: NULL;
 
 		/*
-		 * The next three mutually-exclusive ifs decide
+		 * The next four mutually-exclusive ifs decide
 		 * whether to try injection, and choose the queue to
 		 * pick an I/O request from.
 		 *
@@ -4522,7 +4528,15 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
 		 * next bfqq's I/O is brought forward dramatically,
 		 * for it is not blocked for milliseconds.
 		 *
-		 * The third if checks whether bfqq is a queue for
+		 * The third if checks whether there is a queue woken
+		 * by bfqq, and currently with pending I/O. Such a
+		 * woken queue does not steal bandwidth from bfqq,
+		 * because it remains soon without I/O if bfqq is not
+		 * served. So there is virtually no risk of loss of
+		 * bandwidth for bfqq if this woken queue has I/O
+		 * dispatched while bfqq is waiting for new I/O.
+		 *
+		 * The fourth if checks whether bfqq is a queue for
 		 * which it is better to avoid injection. It is so if
 		 * bfqq delivers more throughput when served without
 		 * any further I/O from other queues in the middle, or
@@ -4542,11 +4556,11 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
 		 * bfq_update_has_short_ttime(), it is rather likely
 		 * that, if I/O is being plugged for bfqq and the
 		 * waker queue has pending I/O requests that are
-		 * blocking bfqq's I/O, then the third alternative
+		 * blocking bfqq's I/O, then the fourth alternative
 		 * above lets the waker queue get served before the
 		 * I/O-plugging timeout fires. So one may deem the
 		 * second alternative superfluous. It is not, because
-		 * the third alternative may be way less effective in
+		 * the fourth alternative may be way less effective in
 		 * case of a synchronization. For two main
 		 * reasons. First, throughput may be low because the
 		 * inject limit may be too low to guarantee the same
@@ -4555,7 +4569,7 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
 		 * guarantees (the second alternative unconditionally
 		 * injects a pending I/O request of the waker queue
 		 * for each bfq_dispatch_request()). Second, with the
-		 * third alternative, the duration of the plugging,
+		 * fourth alternative, the duration of the plugging,
 		 * i.e., the time before bfqq finally receives new I/O,
 		 * may not be minimized, because the waker queue may
 		 * happen to be served only after other queues.
@@ -4573,6 +4587,14 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
 			   bfq_bfqq_budget_left(bfqq->waker_bfqq)
 			)
 			bfqq = bfqq->waker_bfqq;
+		else if (blocked_bfqq &&
+			   bfq_bfqq_busy(blocked_bfqq) &&
+			   blocked_bfqq->next_rq &&
+			   bfq_serv_to_charge(blocked_bfqq->next_rq,
+					      blocked_bfqq) <=
+			   bfq_bfqq_budget_left(blocked_bfqq)
+			)
+			bfqq = blocked_bfqq;
 		else if (!idling_boosts_thr_without_issues(bfqd, bfqq) &&
 			 (bfqq->wr_coeff == 1 || bfqd->wr_busy_queues > 1 ||
 			  !bfq_bfqq_has_short_ttime(bfqq)))
diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
index 26776bdbdf36..02e59931d897 100644
--- a/block/bfq-wf2q.c
+++ b/block/bfq-wf2q.c
@@ -1709,4 +1709,12 @@ void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq)
 
 	if (bfqq->wr_coeff > 1)
 		bfqd->wr_busy_queues++;
+
+	/* Move bfqq to the head of the woken list of its waker */
+	if (!hlist_unhashed(&bfqq->woken_list_node) &&
+	    &bfqq->woken_list_node != bfqq->waker_bfqq->woken_list.first) {
+		hlist_del_init(&bfqq->woken_list_node);
+		hlist_add_head(&bfqq->woken_list_node,
+			       &bfqq->waker_bfqq->woken_list);
+	}
 }

From patchwork Thu Mar  4 17:46:23 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Paolo Valente <paolo.valente@linaro.org>
X-Patchwork-Id: 12116631
Return-Path: <linux-block-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,
	INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,
	USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9D9DDC433E0
	for <linux-block@archiver.kernel.org>; Thu,  4 Mar 2021 17:47:39 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 64B7B64F1E
	for <linux-block@archiver.kernel.org>; Thu,  4 Mar 2021 17:47:39 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S239517AbhCDRq4 (ORCPT <rfc822;linux-block@archiver.kernel.org>);
        Thu, 4 Mar 2021 12:46:56 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48134 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S239477AbhCDRqL (ORCPT
        <rfc822;linux-block@vger.kernel.org>); Thu, 4 Mar 2021 12:46:11 -0500
Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com
 [IPv6:2a00:1450:4864:20::329])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C6D4C0613DF
        for <linux-block@vger.kernel.org>;
 Thu,  4 Mar 2021 09:45:29 -0800 (PST)
Received: by mail-wm1-x329.google.com with SMTP id h7so1197100wmf.3
        for <linux-block@vger.kernel.org>;
 Thu, 04 Mar 2021 09:45:28 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google;
        h=from:to:cc:subject:date:message-id:in-reply-to:references
         :mime-version:content-transfer-encoding;
        bh=TDH4J7SZQSpfvAWXs9TcScn8FkbkQLdueDYOLKxIe6w=;
        b=yHyha/eYbZDOx+kHdx8YDXvC7A5fWOXcB8Hp9T6I/n7oMtsW1Po+g4TO700Dvaoths
         4i1VjGFSigw3r455/97KkFnOUnR6l/ndmaaj9B6sZjVjDTruPYPK3hBBC2fi+kHGdA0B
         mz2Ilzw67rTZ/BublTzRXM5ibSDPWcSUQcgS9omru02wlZ3NatOotn6xbPsJMtmqNqS8
         iy58YznI48kHFdN9v+p+vU0pl7Ua97hIrr5evLSERIDgEdhaJygM1EnKtn+xko6qYmxt
         jHA1qfZS8YDh1L7SO7dWpaVLe825QJTtoLA3B8e0uK0quBQlJFHg0TG5Cj5Pdy51dnTk
         crBQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
         :references:mime-version:content-transfer-encoding;
        bh=TDH4J7SZQSpfvAWXs9TcScn8FkbkQLdueDYOLKxIe6w=;
        b=mkC58pmQntK0VjdkY90trseC5TjzoCSRo40qv6hb05B/fucICaY7BFV2u5PLVqlw1G
         osA4STi7S6QjUR7wpcuJaKljqiRWMf4lkNq0nxNn5wlPytvtAw4TwBjVDcGox8Le3y2z
         IbHmZ9swLSz4jZI/4LGV5+e9P4iG8UF5+HN03rvIZELoZ2uSIthA8EbLC9aKGEB2ZzNj
         ZYtK0fC18QRzYjum76eJp8cAV1SESv2eqYFB4zzfmBj9+PfMFxUM/YNleQ8FCiwRdCbT
         9N/Kmf7w9Mx45DXQHmDp9dhcI9t9ukAQkQ7nShg5RO/g6EspKXbeYnuH6SudMw2VaM3V
         HfBw==
X-Gm-Message-State: AOAM533a8iEUMGbPOu4qWdHtwlMhSJUaGexcNT18P33OfaRwUXWaO8+8
        3EjFZGZUuvmv8NI3HfTAMLmQfQ==
X-Google-Smtp-Source: 
 ABdhPJy3Br9YWrL8pTB2IvtICU1WoVDG6XWXnDgdCki+t/lGxLtpgmkmwmo0/R0uBQ+hmWh3iElPPA==
X-Received: by 2002:a1c:804e:: with SMTP id b75mr5112811wmd.113.1614879927639;
        Thu, 04 Mar 2021 09:45:27 -0800 (PST)
Received: from localhost.localdomain ([83.216.184.132])
        by smtp.gmail.com with ESMTPSA id a21sm271023wmb.5.2021.03.04.09.45.26
        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
        Thu, 04 Mar 2021 09:45:27 -0800 (PST)
From: Paolo Valente <paolo.valente@linaro.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
        Paolo Valente <paolo.valente@linaro.org>,
        Jan Kara <jack@suse.cz>
Subject: [PATCH BUGFIX/IMPROVEMENT V2 2/6] block,
 bfq: put reqs of waker and woken in dispatch list
Date: Thu,  4 Mar 2021 18:46:23 +0100
Message-Id: <20210304174627.161-3-paolo.valente@linaro.org>
X-Mailer: git-send-email 2.20.1
In-Reply-To: <20210304174627.161-1-paolo.valente@linaro.org>
References: <20210304174627.161-1-paolo.valente@linaro.org>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org

Consider a new I/O request that arrives for a bfq_queue bfqq. If, when
this happens, the only active bfq_queues are bfqq and either its waker
bfq_queue or one of its woken bfq_queues, then there is no point in
queueing this new I/O request in bfqq for service. In fact, the
in-service queue and bfqq agree on serving this new I/O request as
soon as possible. So this commit puts this new I/O request directly
into the dispatch list.

Tested-by: Jan Kara <jack@suse.cz>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 44 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index a83149407336..a9c1a14b64f4 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5640,7 +5640,49 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 
 	spin_lock_irq(&bfqd->lock);
 	bfqq = bfq_init_rq(rq);
-	if (!bfqq || at_head || blk_rq_is_passthrough(rq)) {
+
+	/*
+	 * Reqs with at_head or passthrough flags set are to be put
+	 * directly into dispatch list. Additional case for putting rq
+	 * directly into the dispatch queue: the only active
+	 * bfq_queues are bfqq and either its waker bfq_queue or one
+	 * of its woken bfq_queues. The rationale behind this
+	 * additional condition is as follows:
+	 * - consider a bfq_queue, say Q1, detected as a waker of
+	 *   another bfq_queue, say Q2
+	 * - by definition of a waker, Q1 blocks the I/O of Q2, i.e.,
+	 *   some I/O of Q1 needs to be completed for new I/O of Q2
+	 *   to arrive.  A notable example of waker is journald
+	 * - so, Q1 and Q2 are in any respect the queues of two
+	 *   cooperating processes (or of two cooperating sets of
+	 *   processes): the goal of Q1's I/O is doing what needs to
+	 *   be done so that new Q2's I/O can finally be
+	 *   issued. Therefore, if the service of Q1's I/O is delayed,
+	 *   then Q2's I/O is delayed too.  Conversely, if Q2's I/O is
+	 *   delayed, the goal of Q1's I/O is hindered.
+	 * - as a consequence, if some I/O of Q1/Q2 arrives while
+	 *   Q2/Q1 is the only queue in service, there is absolutely
+	 *   no point in delaying the service of such an I/O. The
+	 *   only possible result is a throughput loss
+	 * - so, when the above condition holds, the best option is to
+	 *   have the new I/O dispatched as soon as possible
+	 * - the most effective and efficient way to attain the above
+	 *   goal is to put the new I/O directly in the dispatch
+	 *   list
+	 * - as an additional restriction, Q1 and Q2 must be the only
+	 *   busy queues for this commit to put the I/O of Q2/Q1 in
+	 *   the dispatch list.  This is necessary, because, if also
+	 *   other queues are waiting for service, then putting new
+	 *   I/O directly in the dispatch list may evidently cause a
+	 *   violation of service guarantees for the other queues
+	 */
+	if (!bfqq ||
+	    (bfqq != bfqd->in_service_queue &&
+	     bfqd->in_service_queue != NULL &&
+	     bfq_tot_busy_queues(bfqd) == 1 + bfq_bfqq_busy(bfqq) &&
+	     (bfqq->waker_bfqq == bfqd->in_service_queue ||
+	      bfqd->in_service_queue->waker_bfqq == bfqq)) ||
+	    at_head || blk_rq_is_passthrough(rq)) {
 		if (at_head)
 			list_add(&rq->queuelist, &bfqd->dispatch);
 		else

From patchwork Thu Mar  4 17:46:24 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Paolo Valente <paolo.valente@linaro.org>
X-Patchwork-Id: 12116639
Return-Path: <linux-block-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,
	INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,
	USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id DC21AC4332D
	for <linux-block@archiver.kernel.org>; Thu,  4 Mar 2021 17:47:39 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id BDFBE64F57
	for <linux-block@archiver.kernel.org>; Thu,  4 Mar 2021 17:47:39 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S239477AbhCDRq5 (ORCPT <rfc822;linux-block@archiver.kernel.org>);
        Thu, 4 Mar 2021 12:46:57 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48122 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S239475AbhCDRqL (ORCPT
        <rfc822;linux-block@vger.kernel.org>); Thu, 4 Mar 2021 12:46:11 -0500
Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com
 [IPv6:2a00:1450:4864:20::331])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C8DEC0613E1
        for <linux-block@vger.kernel.org>;
 Thu,  4 Mar 2021 09:45:30 -0800 (PST)
Received: by mail-wm1-x331.google.com with SMTP id i9so8853445wml.0
        for <linux-block@vger.kernel.org>;
 Thu, 04 Mar 2021 09:45:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google;
        h=from:to:cc:subject:date:message-id:in-reply-to:references
         :mime-version:content-transfer-encoding;
        bh=CpdNy3aVzBPLz4zskOuLUeTUqeMtI64LdXSnJTLPY1g=;
        b=GPkvXnFiKfBcww25//CHtxNQHCRJwuNcosNdpu4fIPtjSdJsiRTpdfPS7OdamfBEO5
         JbNRTznkyntl0F3NmCUPktJDKDXVvjv9892sOs8/9WUntPeef64/XAe82HKosaoJeWQR
         fPUZq94AX1RrLkujTJYL39lQGKoSYJGymoYluRIrOASVQNI22kv9UKU2TBCIEXSrYvfM
         nWq5UcU+IuoxseNVq908TpzcsaAOf/4+QhhnVWjbOOGzIvLE2VJcYknpJEZ81CTWr0Ja
         3PGkLPHvHccJgxaPZ2sAMUlIIf4t29n/B8oKSZxuZ3bYirE2NafJd+EJ4rFyTOSTqGmP
         yPJw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
         :references:mime-version:content-transfer-encoding;
        bh=CpdNy3aVzBPLz4zskOuLUeTUqeMtI64LdXSnJTLPY1g=;
        b=dUNDs37MNqZS3fo25H4mZDrhrRX2+8IviosGUX38wzN3uYbYKMFO4WAn7GfFKCr/aF
         E5cnCMcDDL+fRlgBS79FZbP2s/XvFB+3CHsAJr83dvBvj1Bs3lM8w3SAbbmePe2ES83E
         Dhu2rY2JKdBnZNrH4lE7iYdDpLCn5S+2cGGO8PvWb/HOc4Y9i8YOCI2o7cC7ZxghPEA1
         kzXNPXM+5MJ7+WRdY/m4nzkDDALH3KlJxLxsgJKLTrZ6vRB18R8jBNRpPfHjKzZLNJoG
         9/IgT8WKuR6UZZDal9onlrLvZdaPhQusn6QwFcgcQq1/nH8ejONvr32ZQp8OPKQRC5lz
         rW+Q==
X-Gm-Message-State: AOAM532Ql143ntASWqBgkSsXvBoPSNqHqGiCyEvC9jea+4gw221XrbLE
        iYP1PRaGB37y9/EElRhQkhUguQ==
X-Google-Smtp-Source: 
 ABdhPJyJD+YF0j69T7Vwu87133tqy5JdmSjbivAHfFXRLXHtYkNix+8sy7MhVcPQIAlga48QsZSKDw==
X-Received: by 2002:a7b:c4c9:: with SMTP id g9mr5180733wmk.82.1614879929196;
        Thu, 04 Mar 2021 09:45:29 -0800 (PST)
Received: from localhost.localdomain ([83.216.184.132])
        by smtp.gmail.com with ESMTPSA id a21sm271023wmb.5.2021.03.04.09.45.27
        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
        Thu, 04 Mar 2021 09:45:28 -0800 (PST)
From: Paolo Valente <paolo.valente@linaro.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
        Paolo Valente <paolo.valente@linaro.org>,
        Jan Kara <jack@suse.cz>
Subject: [PATCH BUGFIX/IMPROVEMENT V2 3/6] block,
 bfq: make shared queues inherit wakers
Date: Thu,  4 Mar 2021 18:46:24 +0100
Message-Id: <20210304174627.161-4-paolo.valente@linaro.org>
X-Mailer: git-send-email 2.20.1
In-Reply-To: <20210304174627.161-1-paolo.valente@linaro.org>
References: <20210304174627.161-1-paolo.valente@linaro.org>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org

Consider a bfq_queue bfqq that is about to be merged with another
bfq_queue new_bfqq. The processes associated with bfqq are cooperators
of the processes associated with new_bfqq. So, if bfqq has a waker,
then it is reasonable (and beneficial for throughput) to assume that
all these processes will be happy to let bfqq's waker freely inject
I/O when they have no I/O. So this commit makes new_bfqq inherit
bfqq's waker.

Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 42 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index a9c1a14b64f4..4b3d4849f3f5 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2819,6 +2819,29 @@ bfq_merge_bfqqs(struct bfq_data *bfqd, struct bfq_io_cq *bic,
 		bfq_mark_bfqq_IO_bound(new_bfqq);
 	bfq_clear_bfqq_IO_bound(bfqq);
 
+	/*
+	 * The processes associated with bfqq are cooperators of the
+	 * processes associated with new_bfqq. So, if bfqq has a
+	 * waker, then assume that all these processes will be happy
+	 * to let bfqq's waker freely inject I/O when they have no
+	 * I/O.
+	 */
+	if (bfqq->waker_bfqq && !new_bfqq->waker_bfqq &&
+	    bfqq->waker_bfqq != new_bfqq) {
+		new_bfqq->waker_bfqq = bfqq->waker_bfqq;
+		new_bfqq->tentative_waker_bfqq = NULL;
+
+		/*
+		 * If the waker queue disappears, then
+		 * new_bfqq->waker_bfqq must be reset. So insert
+		 * new_bfqq into the woken_list of the waker. See
+		 * bfq_check_waker for details.
+		 */
+		hlist_add_head(&new_bfqq->woken_list_node,
+			       &new_bfqq->waker_bfqq->woken_list);
+
+	}
+
 	/*
 	 * If bfqq is weight-raised, then let new_bfqq inherit
 	 * weight-raising. To reduce false positives, neglect the case
@@ -6303,7 +6326,7 @@ static struct bfq_queue *bfq_init_rq(struct request *rq)
 	if (likely(!new_queue)) {
 		/* If the queue was seeky for too long, break it apart. */
 		if (bfq_bfqq_coop(bfqq) && bfq_bfqq_split_coop(bfqq)) {
-			bfq_log_bfqq(bfqd, bfqq, "breaking apart bfqq");
+			struct bfq_queue *old_bfqq = bfqq;
 
 			/* Update bic before losing reference to bfqq */
 			if (bfq_bfqq_in_large_burst(bfqq))
@@ -6312,11 +6335,24 @@ static struct bfq_queue *bfq_init_rq(struct request *rq)
 			bfqq = bfq_split_bfqq(bic, bfqq);
 			split = true;
 
-			if (!bfqq)
+			if (!bfqq) {
 				bfqq = bfq_get_bfqq_handle_split(bfqd, bic, bio,
 								 true, is_sync,
 								 NULL);
-			else
+				bfqq->waker_bfqq = old_bfqq->waker_bfqq;
+				bfqq->tentative_waker_bfqq = NULL;
+
+				/*
+				 * If the waker queue disappears, then
+				 * new_bfqq->waker_bfqq must be
+				 * reset. So insert new_bfqq into the
+				 * woken_list of the waker. See
+				 * bfq_check_waker for details.
+				 */
+				if (bfqq->waker_bfqq)
+					hlist_add_head(&bfqq->woken_list_node,
+						       &bfqq->waker_bfqq->woken_list);
+			} else
 				bfqq_already_existing = true;
 		}
 	}

From patchwork Thu Mar  4 17:46:25 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Paolo Valente <paolo.valente@linaro.org>
X-Patchwork-Id: 12116633
Return-Path: <linux-block-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,
	INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,
	USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C5556C433E9
	for <linux-block@archiver.kernel.org>; Thu,  4 Mar 2021 17:47:39 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id B01C264F1E
	for <linux-block@archiver.kernel.org>; Thu,  4 Mar 2021 17:47:39 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S239468AbhCDRq5 (ORCPT <rfc822;linux-block@archiver.kernel.org>);
        Thu, 4 Mar 2021 12:46:57 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48648 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S239624AbhCDRqg (ORCPT
        <rfc822;linux-block@vger.kernel.org>); Thu, 4 Mar 2021 12:46:36 -0500
Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com
 [IPv6:2a00:1450:4864:20::32d])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7CEE2C0613E5
        for <linux-block@vger.kernel.org>;
 Thu,  4 Mar 2021 09:45:31 -0800 (PST)
Received: by mail-wm1-x32d.google.com with SMTP id m7so2177339wmq.0
        for <linux-block@vger.kernel.org>;
 Thu, 04 Mar 2021 09:45:31 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google;
        h=from:to:cc:subject:date:message-id:in-reply-to:references
         :mime-version:content-transfer-encoding;
        bh=ZifMIAy4GGWpMtKfD5bvrDinqLUVzewXwC0UEUTmlwY=;
        b=PjYZ1GpCiBD4K0H0z8sr3//ABAf+ulr3tGusEeh/C3izrUkvaloOjyfQ51rrogquga
         oec8/6rWVQytHyJkUvjAGG19+tBG0Ad06w/eHjy27k0uAgQz1EImxVuoHri2/H5eIMmD
         pxA3q41/6l3VlgRlHp8grI2tIX68AVyokEPXVmjsb0SSunZwNdCT0WUGxqpufHvZTyFt
         alRqne7tAZ/ZNOKDrGRYslmEXLDGiwreqLy7nuInIBl9F2+3+K/GkGU/zZJwGlehre+y
         Kqa+vQaSnmxtfUJt+bVk3SOV8T4WOfLZAaROCswwZkOx4pMHBd/KiInH3c8ITBjsUuqc
         c9EA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
         :references:mime-version:content-transfer-encoding;
        bh=ZifMIAy4GGWpMtKfD5bvrDinqLUVzewXwC0UEUTmlwY=;
        b=MmrASw8DiYju3M98MfQ19UdwDXnyBtw3N+0EpWrB4Sm7jmaUfCLvB8ZIUh6QXGJfhm
         xIbhk3E4Kv+t8bcJVcct6Y92PXlVTr9pWXli4o9HDj3za9D/+Aev/A9i/091ywFVZDlz
         fuk2SNRYL34CLXFqWlJJdo9j1USSQU19K76zDLrsQhEti8G/y2EDXpAng0fYXfEh1O65
         QrkXFw9wG5h8QWraJTmXtmz+7E6+VMWqHBpno4fx+pzPeXMlxce5xUlz5racML1zhflG
         68Bw0ZT5rpdFq7qbam3bupi9wptpP63pNDLYL1PESaV1v4TnE3bLcixXyXQXTtK7szXt
         Bv6g==
X-Gm-Message-State: AOAM532w8i3MBoWSBWUNC0696M84CF07LWgnSx2iy77XHQV1zrHWSAvT
        dKiZyxih7g6q8JA28E0YoT0OFQ==
X-Google-Smtp-Source: 
 ABdhPJwSrAkfUofaXmnIahdY/3otBrfJ8ZpCF9gnb8mBk0i69W/Y3UzlbotwhW3L0SirKLRbithIEg==
X-Received: by 2002:a1c:1d14:: with SMTP id d20mr5146554wmd.36.1614879930430;
        Thu, 04 Mar 2021 09:45:30 -0800 (PST)
Received: from localhost.localdomain ([83.216.184.132])
        by smtp.gmail.com with ESMTPSA id a21sm271023wmb.5.2021.03.04.09.45.29
        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
        Thu, 04 Mar 2021 09:45:30 -0800 (PST)
From: Paolo Valente <paolo.valente@linaro.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
        Paolo Valente <paolo.valente@linaro.org>,
        Jan Kara <jack@suse.cz>
Subject: [PATCH BUGFIX/IMPROVEMENT V2 4/6] block,
 bfq: fix weight-raising resume with !low_latency
Date: Thu,  4 Mar 2021 18:46:25 +0100
Message-Id: <20210304174627.161-5-paolo.valente@linaro.org>
X-Mailer: git-send-email 2.20.1
In-Reply-To: <20210304174627.161-1-paolo.valente@linaro.org>
References: <20210304174627.161-1-paolo.valente@linaro.org>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org

When the io_latency heuristic is off, bfq_queues must not start to be
weight-raised. Unfortunately, by mistake, this may happen when the
state of a previously weight-raised bfq_queue is resumed after a queue
split. This commit fixes this error.

Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 4b3d4849f3f5..8497d0803d74 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -1010,7 +1010,7 @@ static void
 bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_data *bfqd,
 		      struct bfq_io_cq *bic, bool bfq_already_existing)
 {
-	unsigned int old_wr_coeff = bfqq->wr_coeff;
+	unsigned int old_wr_coeff = 1;
 	bool busy = bfq_already_existing && bfq_bfqq_busy(bfqq);
 
 	if (bic->saved_has_short_ttime)
@@ -1031,7 +1031,13 @@ bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_data *bfqd,
 	bfqq->ttime = bic->saved_ttime;
 	bfqq->io_start_time = bic->saved_io_start_time;
 	bfqq->tot_idle_time = bic->saved_tot_idle_time;
-	bfqq->wr_coeff = bic->saved_wr_coeff;
+	/*
+	 * Restore weight coefficient only if low_latency is on
+	 */
+	if (bfqd->low_latency) {
+		old_wr_coeff = bfqq->wr_coeff;
+		bfqq->wr_coeff = bic->saved_wr_coeff;
+	}
 	bfqq->service_from_wr = bic->saved_service_from_wr;
 	bfqq->wr_start_at_switch_to_srt = bic->saved_wr_start_at_switch_to_srt;
 	bfqq->last_wr_start_finish = bic->saved_last_wr_start_finish;

From patchwork Thu Mar  4 17:46:26 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Paolo Valente <paolo.valente@linaro.org>
X-Patchwork-Id: 12116637
Return-Path: <linux-block-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,
	INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,
	USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E1252C4332E
	for <linux-block@archiver.kernel.org>; Thu,  4 Mar 2021 17:47:39 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id CBF0B64F5E
	for <linux-block@archiver.kernel.org>; Thu,  4 Mar 2021 17:47:39 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S239878AbhCDRq5 (ORCPT <rfc822;linux-block@archiver.kernel.org>);
        Thu, 4 Mar 2021 12:46:57 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48650 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S239612AbhCDRqg (ORCPT
        <rfc822;linux-block@vger.kernel.org>); Thu, 4 Mar 2021 12:46:36 -0500
Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com
 [IPv6:2a00:1450:4864:20::333])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D1A8C0613E6
        for <linux-block@vger.kernel.org>;
 Thu,  4 Mar 2021 09:45:32 -0800 (PST)
Received: by mail-wm1-x333.google.com with SMTP id h7so1197274wmf.3
        for <linux-block@vger.kernel.org>;
 Thu, 04 Mar 2021 09:45:32 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google;
        h=from:to:cc:subject:date:message-id:in-reply-to:references
         :mime-version:content-transfer-encoding;
        bh=fqoAgrq4fJR4HLR5WghEXfCAYNkU/FugUfR5Q3syhNI=;
        b=qlMH0cAiJPI42edwa7vSynVTvwlt7RzKWBRJnfYew+7ItSFTZGvO+GflZH1ioB75VE
         7KwH/80xPrc6XmRzh+f9o1EuaBrM6O74hIx/arXEb6VSZTJuDVYKhc95T54asR64a5mx
         KDZuxHPG9+0fobf971+p3EgR6WvX4G44xnZSTSWgLKu+3EwLbJWTyEzyk6MB8pK8txhR
         S1cuFiOdYZX8UOfdssdUyHI/IxVhFdtmYvrDbELV/FRQZFvt6BiuQHuLTkCcpjH0iv8c
         dHrvHnJKyp4gdkVHTkXa44MOzxze864WvQpo2+jdOciSJANxt6Hl+4jk40ZpX7O/SvJc
         XQ4g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
         :references:mime-version:content-transfer-encoding;
        bh=fqoAgrq4fJR4HLR5WghEXfCAYNkU/FugUfR5Q3syhNI=;
        b=Qf0LZLinfH9uLTRWjLqDDx3DXsJ/5TXK3b0enoObcLGiaYspBRDSM6ZvpSJ0HTDfkb
         z11oUIzUFUMwPDkt/KnI9xGFxnFB/TmD7UpPQz3bAg7pOp6JBHWDN4MbW1qU4kWxSieO
         hvVR1kI2DvH6qrpKjlVoEJuAeLbe4EoPWDhDPWeU/HneM8VG4bZLdJL6cx3DxcB+Z6uU
         1im1i8ylbHHUqCu7x/0DET2TAhJYsosLqFzAII29FLRH6SxfVWylSuWHcy/OGnlDfXNa
         lgootNfJuog6a0eyFjxRjRQJ14Wvn+RtQ1tFK021p+Jchglskuj8snHdhljalOej9rwn
         R7Vw==
X-Gm-Message-State: AOAM533DxPP5VxSuu0A9Iz07D/hlTm1HJJ6VZhOnPsU5lHzg1QoJ5/Zh
        wCIXFCSTZhok3Yetf+SIHbKn2Q==
X-Google-Smtp-Source: 
 ABdhPJzUjkxf4oTl38XpoTuIZ90ldSKeCXuYNmoMUkpEQWZP8didLAO2Ul56bAwAhDv2VYbyZkl84w==
X-Received: by 2002:a1c:7407:: with SMTP id p7mr5107660wmc.51.1614879931659;
        Thu, 04 Mar 2021 09:45:31 -0800 (PST)
Received: from localhost.localdomain ([83.216.184.132])
        by smtp.gmail.com with ESMTPSA id a21sm271023wmb.5.2021.03.04.09.45.30
        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
        Thu, 04 Mar 2021 09:45:31 -0800 (PST)
From: Paolo Valente <paolo.valente@linaro.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
        Paolo Valente <paolo.valente@linaro.org>,
        Jan Kara <jack@suse.cz>
Subject: [PATCH BUGFIX/IMPROVEMENT V2 5/6] block,
 bfq: keep shared queues out of the waker mechanism
Date: Thu,  4 Mar 2021 18:46:26 +0100
Message-Id: <20210304174627.161-6-paolo.valente@linaro.org>
X-Mailer: git-send-email 2.20.1
In-Reply-To: <20210304174627.161-1-paolo.valente@linaro.org>
References: <20210304174627.161-1-paolo.valente@linaro.org>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org

Shared queues are likely to receive I/O at a high rate. This may
deceptively let them be considered as wakers of other queues. But a
false waker will unjustly steal bandwidth to its supposedly woken
queue. So considering also shared queues in the waking mechanism may
cause more control troubles than throughput benefits. This commit
keeps shared queues out of the waker-detection mechanism.

Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 8497d0803d74..c62dbbe9cc33 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5852,7 +5852,17 @@ static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd)
 			1UL<<(BFQ_RATE_SHIFT - 10))
 		bfq_update_rate_reset(bfqd, NULL);
 	bfqd->last_completion = now_ns;
-	bfqd->last_completed_rq_bfqq = bfqq;
+	/*
+	 * Shared queues are likely to receive I/O at a high
+	 * rate. This may deceptively let them be considered as wakers
+	 * of other queues. But a false waker will unjustly steal
+	 * bandwidth to its supposedly woken queue. So considering
+	 * also shared queues in the waking mechanism may cause more
+	 * control troubles than throughput benefits. Then do not set
+	 * last_completed_rq_bfqq to bfqq if bfqq is a shared queue.
+	 */
+	if (!bfq_bfqq_coop(bfqq))
+		bfqd->last_completed_rq_bfqq = bfqq;
 
 	/*
 	 * If we are waiting to discover whether the request pattern

From patchwork Thu Mar  4 17:46:27 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Paolo Valente <paolo.valente@linaro.org>
X-Patchwork-Id: 12116641
Return-Path: <linux-block-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,
	INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,
	USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E6E39C433E6
	for <linux-block@archiver.kernel.org>; Thu,  4 Mar 2021 17:47:57 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id B38D064F1E
	for <linux-block@archiver.kernel.org>; Thu,  4 Mar 2021 17:47:57 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S236477AbhCDRr0 (ORCPT <rfc822;linux-block@archiver.kernel.org>);
        Thu, 4 Mar 2021 12:47:26 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48740 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S239909AbhCDRrA (ORCPT
        <rfc822;linux-block@vger.kernel.org>); Thu, 4 Mar 2021 12:47:00 -0500
Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com
 [IPv6:2a00:1450:4864:20::334])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 168DCC0613E8
        for <linux-block@vger.kernel.org>;
 Thu,  4 Mar 2021 09:45:34 -0800 (PST)
Received: by mail-wm1-x334.google.com with SMTP id m1so10510088wml.2
        for <linux-block@vger.kernel.org>;
 Thu, 04 Mar 2021 09:45:34 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google;
        h=from:to:cc:subject:date:message-id:in-reply-to:references
         :mime-version:content-transfer-encoding;
        bh=rUJYO+IGGMUmfH4jP8L8tpLpJtsLPrUa2la0FyD/fNk=;
        b=w71kKpWQdhCsUkBUQjZcnm852YrBxPbDflL4e5eq8Z7eCNmiVdswPSnjDOQuYJmzfe
         kHaVMjQUQn6NjJ0RB/ryZqAyNXV0a+JZFSTYzm3e+9g6l1L1F/gxYRCaSSqlzW1H4kkr
         X8+ksVZS60DKqRCMLH+311IYTJrMwzrffNqMPtSl35Zfhca1/KiMID8sPGmC04TFkO+U
         E999xkB4rpXkzYgXJAoHyX4iwSpxDScPA2PtSGNDFsGAb5X01tuF2WBqbkvwjp5RH3gs
         QZmwWfjc5jD7ORbzIVR+Kn2AoXV6SuKYDsGKzgx+V8nCY2U+GMH0Hzl+y8bfTkI0XIjb
         rEIg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
         :references:mime-version:content-transfer-encoding;
        bh=rUJYO+IGGMUmfH4jP8L8tpLpJtsLPrUa2la0FyD/fNk=;
        b=aERu4U5+CF/zPCNOP5OkBNgFW5sFbtUXsUTJSvMp9nX1ccB8XYgJJlhbvpDPUmq2an
         EoxjJQ77/6XEoJRmyj6a1637cPEZ/9j15QKfRnUDgoW8QQJLGxDJMdwWWs7weCK0Bj/z
         NmXEpsGY4IwoZxXmqRP2dnXJTG/7NZ6MxmsP9PSJpGmTd78RARacEak/Wl6+VNGEQ46Z
         KSU6mPK4gUjfNtGbHPM6x0kV7diuTw0Lhcg6GR42wzgarb6IF1gGGdE1PmP58jUR7gzs
         CckJb7swN6xC8iA1tfl7W/ZGeq1sZdzqatbS2oRpbMubIjH0lPl3UgSaUIc3Wj0qFrXF
         Hawg==
X-Gm-Message-State: AOAM530P0sBowfasv7FTUvsnfpVenlZbDqUmiNHfJltC36g6s+OflKgG
        hywiOQkAuPpmmpIl8mswgd4/eQ==
X-Google-Smtp-Source: 
 ABdhPJzKHnQ5n2U6++9t9B6tZNjfgHBzeETen39ABFLxdfoMORV+IlsoQF3VkuBXawComtjhQdL0jg==
X-Received: by 2002:a1c:400b:: with SMTP id n11mr5021778wma.167.1614879933070;
        Thu, 04 Mar 2021 09:45:33 -0800 (PST)
Received: from localhost.localdomain ([83.216.184.132])
        by smtp.gmail.com with ESMTPSA id a21sm271023wmb.5.2021.03.04.09.45.31
        (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
        Thu, 04 Mar 2021 09:45:32 -0800 (PST)
From: Paolo Valente <paolo.valente@linaro.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
        Paolo Valente <paolo.valente@linaro.org>,
        Jan Kara <jack@suse.cz>
Subject: [PATCH BUGFIX/IMPROVEMENT V2 6/6] block,
 bfq: merge bursts of newly-created queues
Date: Thu,  4 Mar 2021 18:46:27 +0100
Message-Id: <20210304174627.161-7-paolo.valente@linaro.org>
X-Mailer: git-send-email 2.20.1
In-Reply-To: <20210304174627.161-1-paolo.valente@linaro.org>
References: <20210304174627.161-1-paolo.valente@linaro.org>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org

Many throughput-sensitive workloads are made of several parallel I/O
flows, with all flows generated by the same application, or more
generically by the same task (e.g., system boot). The most
counterproductive action with these workloads is plugging I/O dispatch
when one of the bfq_queues associated with these flows remains
temporarily empty.

To avoid this plugging, BFQ has been using a burst-handling mechanism
for years now. This mechanism has proven effective for throughput, and
not detrimental for service guarantees. This commit pushes this
mechanism a little bit further, basing on the following two facts.

First, all the I/O flows of a the same application or task contribute
to the execution/completion of that common application or task. So the
performance figures that matter are total throughput of the flows and
task-wide I/O latency.  In particular, these flows do not need to be
protected from each other, in terms of individual bandwidth or
latency.

Second, the above fact holds regardless of the number of flows.

Putting these two facts together, this commits merges stably the
bfq_queues associated with these I/O flows, i.e., with the processes
that generate these IO/ flows, regardless of how many the involved
processes are.

To decide whether a set of bfq_queues is actually associated with the
I/O flows of a common application or task, and to merge these queues
stably, this commit operates as follows: given a bfq_queue, say Q2,
currently being created, and the last bfq_queue, say Q1, created
before Q2, Q2 is merged stably with Q1 if
- very little time has elapsed since when Q1 was created
- Q2 has the same ioprio as Q1
- Q2 belongs to the same group as Q1

Merging bfq_queues also reduces scheduling overhead. A fio test with
ten random readers on /dev/nullb shows a throughput boost of 40%, with
a quadcore. Since BFQ's execution time amounts to ~50% of the total
per-request processing time, the above throughput boost implies that
BFQ's overhead is reduced by more than 50%.

Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-cgroup.c  |   2 +
 block/bfq-iosched.c | 259 ++++++++++++++++++++++++++++++++++++++++++--
 block/bfq-iosched.h |  15 +++
 3 files changed, 266 insertions(+), 10 deletions(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index b791e2041e49..e2f14508f2d6 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -547,6 +547,8 @@ static void bfq_pd_init(struct blkg_policy_data *pd)
 
 	entity->orig_weight = entity->weight = entity->new_weight = d->weight;
 	entity->my_sched_data = &bfqg->sched_data;
+	entity->last_bfqq_created = NULL;
+
 	bfqg->my_entity = entity; /*
 				   * the root_group's will be set to NULL
 				   * in bfq_init_queue()
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index c62dbbe9cc33..4ba89c55a856 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -1073,7 +1073,7 @@ bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_data *bfqd,
 static int bfqq_process_refs(struct bfq_queue *bfqq)
 {
 	return bfqq->ref - bfqq->allocated - bfqq->entity.on_st_or_in_serv -
-		(bfqq->weight_counter != NULL);
+		(bfqq->weight_counter != NULL) - bfqq->stable_ref;
 }
 
 /* Empty burst list and add just bfqq (see comments on bfq_handle_burst) */
@@ -2625,6 +2625,11 @@ static bool bfq_may_be_close_cooperator(struct bfq_queue *bfqq,
 	return true;
 }
 
+static bool idling_boosts_thr_without_issues(struct bfq_data *bfqd,
+					     struct bfq_queue *bfqq);
+
+static void bfq_put_stable_ref(struct bfq_queue *bfqq);
+
 /*
  * Attempt to schedule a merge of bfqq with the currently in-service
  * queue or with a close queue among the scheduled queues.  Return
@@ -2647,10 +2652,49 @@ static bool bfq_may_be_close_cooperator(struct bfq_queue *bfqq,
  */
 static struct bfq_queue *
 bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
-		     void *io_struct, bool request)
+		     void *io_struct, bool request, struct bfq_io_cq *bic)
 {
 	struct bfq_queue *in_service_bfqq, *new_bfqq;
 
+	/*
+	 * Check delayed stable merge for rotational or non-queueing
+	 * devs. For this branch to be executed, bfqq must not be
+	 * currently merged with some other queue (i.e., bfqq->bic
+	 * must be non null). If we considered also merged queues,
+	 * then we should also check whether bfqq has already been
+	 * merged with bic->stable_merge_bfqq. But this would be
+	 * costly and complicated.
+	 */
+	if (unlikely(!bfqd->nonrot_with_queueing)) {
+		if (bic->stable_merge_bfqq &&
+		    !bfq_bfqq_just_created(bfqq) &&
+		    time_is_after_jiffies(bfqq->split_time +
+					  msecs_to_jiffies(200))) {
+			struct bfq_queue *stable_merge_bfqq =
+				bic->stable_merge_bfqq;
+			int proc_ref = min(bfqq_process_refs(bfqq),
+					   bfqq_process_refs(stable_merge_bfqq));
+
+			/* deschedule stable merge, because done or aborted here */
+			bfq_put_stable_ref(stable_merge_bfqq);
+
+			bic->stable_merge_bfqq = NULL;
+
+			if (!idling_boosts_thr_without_issues(bfqd, bfqq) &&
+			    proc_ref > 0) {
+				/* next function will take at least one ref */
+				struct bfq_queue *new_bfqq =
+					bfq_setup_merge(bfqq, stable_merge_bfqq);
+
+				bic->stably_merged = true;
+				if (new_bfqq && new_bfqq->bic)
+					new_bfqq->bic->stably_merged = true;
+				return new_bfqq;
+			} else
+				return NULL;
+		}
+	}
+
 	/*
 	 * Do not perform queue merging if the device is non
 	 * rotational and performs internal queueing. In fact, such a
@@ -2792,6 +2836,17 @@ static void bfq_bfqq_save_state(struct bfq_queue *bfqq)
 	}
 }
 
+
+static void
+bfq_reassign_last_bfqq(struct bfq_queue *cur_bfqq, struct bfq_queue *new_bfqq)
+{
+	if (cur_bfqq->entity.parent &&
+	    cur_bfqq->entity.parent->last_bfqq_created == cur_bfqq)
+		cur_bfqq->entity.parent->last_bfqq_created = new_bfqq;
+	else if (cur_bfqq->bfqd && cur_bfqq->bfqd->last_bfqq_created == cur_bfqq)
+		cur_bfqq->bfqd->last_bfqq_created = new_bfqq;
+}
+
 void bfq_release_process_ref(struct bfq_data *bfqd, struct bfq_queue *bfqq)
 {
 	/*
@@ -2809,6 +2864,8 @@ void bfq_release_process_ref(struct bfq_data *bfqd, struct bfq_queue *bfqq)
 	    bfqq != bfqd->in_service_queue)
 		bfq_del_bfqq_busy(bfqd, bfqq, false);
 
+	bfq_reassign_last_bfqq(bfqq, NULL);
+
 	bfq_put_queue(bfqq);
 }
 
@@ -2905,6 +2962,9 @@ bfq_merge_bfqqs(struct bfq_data *bfqd, struct bfq_io_cq *bic,
 	 */
 	new_bfqq->pid = -1;
 	bfqq->bic = NULL;
+
+	bfq_reassign_last_bfqq(bfqq, new_bfqq);
+
 	bfq_release_process_ref(bfqd, bfqq);
 }
 
@@ -2932,7 +2992,7 @@ static bool bfq_allow_bio_merge(struct request_queue *q, struct request *rq,
 	 * We take advantage of this function to perform an early merge
 	 * of the queues of possible cooperating processes.
 	 */
-	new_bfqq = bfq_setup_cooperator(bfqd, bfqq, bio, false);
+	new_bfqq = bfq_setup_cooperator(bfqd, bfqq, bio, false, bfqd->bio_bic);
 	if (new_bfqq) {
 		/*
 		 * bic still points to bfqq, then it has not yet been
@@ -5033,6 +5093,12 @@ void bfq_put_queue(struct bfq_queue *bfqq)
 	bfqg_and_blkg_put(bfqg);
 }
 
+static void bfq_put_stable_ref(struct bfq_queue *bfqq)
+{
+	bfqq->stable_ref--;
+	bfq_put_queue(bfqq);
+}
+
 static void bfq_put_cooperator(struct bfq_queue *bfqq)
 {
 	struct bfq_queue *__bfqq, *next;
@@ -5089,6 +5155,24 @@ static void bfq_exit_icq(struct io_cq *icq)
 {
 	struct bfq_io_cq *bic = icq_to_bic(icq);
 
+	if (bic->stable_merge_bfqq) {
+		struct bfq_data *bfqd = bic->stable_merge_bfqq->bfqd;
+
+		/*
+		 * bfqd is NULL if scheduler already exited, and in
+		 * that case this is the last time bfqq is accessed.
+		 */
+		if (bfqd) {
+			unsigned long flags;
+
+			spin_lock_irqsave(&bfqd->lock, flags);
+			bfq_put_stable_ref(bic->stable_merge_bfqq);
+			spin_unlock_irqrestore(&bfqd->lock, flags);
+		} else {
+			bfq_put_stable_ref(bic->stable_merge_bfqq);
+		}
+	}
+
 	bfq_exit_icq_bfqq(bic, true);
 	bfq_exit_icq_bfqq(bic, false);
 }
@@ -5149,7 +5233,8 @@ bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
 
 static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
 				       struct bio *bio, bool is_sync,
-				       struct bfq_io_cq *bic);
+				       struct bfq_io_cq *bic,
+				       bool respawn);
 
 static void bfq_check_ioprio_change(struct bfq_io_cq *bic, struct bio *bio)
 {
@@ -5169,7 +5254,7 @@ static void bfq_check_ioprio_change(struct bfq_io_cq *bic, struct bio *bio)
 	bfqq = bic_to_bfqq(bic, false);
 	if (bfqq) {
 		bfq_release_process_ref(bfqd, bfqq);
-		bfqq = bfq_get_queue(bfqd, bio, BLK_RW_ASYNC, bic);
+		bfqq = bfq_get_queue(bfqd, bio, BLK_RW_ASYNC, bic, true);
 		bic_set_bfqq(bic, bfqq, false);
 	}
 
@@ -5212,6 +5297,8 @@ static void bfq_init_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 	/* set end request to minus infinity from now */
 	bfqq->ttime.last_end_request = now_ns + 1;
 
+	bfqq->creation_time = jiffies;
+
 	bfqq->io_start_time = now_ns;
 
 	bfq_mark_bfqq_IO_bound(bfqq);
@@ -5261,9 +5348,156 @@ static struct bfq_queue **bfq_async_queue_prio(struct bfq_data *bfqd,
 	}
 }
 
+static struct bfq_queue *
+bfq_do_early_stable_merge(struct bfq_data *bfqd, struct bfq_queue *bfqq,
+			  struct bfq_io_cq *bic,
+			  struct bfq_queue *last_bfqq_created)
+{
+	struct bfq_queue *new_bfqq =
+		bfq_setup_merge(bfqq, last_bfqq_created);
+
+	if (!new_bfqq)
+		return bfqq;
+
+	if (new_bfqq->bic)
+		new_bfqq->bic->stably_merged = true;
+	bic->stably_merged = true;
+
+	/*
+	 * Reusing merge functions. This implies that
+	 * bfqq->bic must be set too, for
+	 * bfq_merge_bfqqs to correctly save bfqq's
+	 * state before killing it.
+	 */
+	bfqq->bic = bic;
+	bfq_merge_bfqqs(bfqd, bic, bfqq, new_bfqq);
+
+	return new_bfqq;
+}
+
+/*
+ * Many throughput-sensitive workloads are made of several parallel
+ * I/O flows, with all flows generated by the same application, or
+ * more generically by the same task (e.g., system boot). The most
+ * counterproductive action with these workloads is plugging I/O
+ * dispatch when one of the bfq_queues associated with these flows
+ * remains temporarily empty.
+ *
+ * To avoid this plugging, BFQ has been using a burst-handling
+ * mechanism for years now. This mechanism has proven effective for
+ * throughput, and not detrimental for service guarantees. The
+ * following function pushes this mechanism a little bit further,
+ * basing on the following two facts.
+ *
+ * First, all the I/O flows of a the same application or task
+ * contribute to the execution/completion of that common application
+ * or task. So the performance figures that matter are total
+ * throughput of the flows and task-wide I/O latency.  In particular,
+ * these flows do not need to be protected from each other, in terms
+ * of individual bandwidth or latency.
+ *
+ * Second, the above fact holds regardless of the number of flows.
+ *
+ * Putting these two facts together, this commits merges stably the
+ * bfq_queues associated with these I/O flows, i.e., with the
+ * processes that generate these IO/ flows, regardless of how many the
+ * involved processes are.
+ *
+ * To decide whether a set of bfq_queues is actually associated with
+ * the I/O flows of a common application or task, and to merge these
+ * queues stably, this function operates as follows: given a bfq_queue,
+ * say Q2, currently being created, and the last bfq_queue, say Q1,
+ * created before Q2, Q2 is merged stably with Q1 if
+ * - very little time has elapsed since when Q1 was created
+ * - Q2 has the same ioprio as Q1
+ * - Q2 belongs to the same group as Q1
+ *
+ * Merging bfq_queues also reduces scheduling overhead. A fio test
+ * with ten random readers on /dev/nullb shows a throughput boost of
+ * 40%, with a quadcore. Since BFQ's execution time amounts to ~50% of
+ * the total per-request processing time, the above throughput boost
+ * implies that BFQ's overhead is reduced by more than 50%.
+ *
+ * This new mechanism most certainly obsoletes the current
+ * burst-handling heuristics. We keep those heuristics for the moment.
+ */
+static struct bfq_queue *bfq_do_or_sched_stable_merge(struct bfq_data *bfqd,
+						      struct bfq_queue *bfqq,
+						      struct bfq_io_cq *bic)
+{
+	struct bfq_queue **source_bfqq = bfqq->entity.parent ?
+		&bfqq->entity.parent->last_bfqq_created :
+		&bfqd->last_bfqq_created;
+
+	struct bfq_queue *last_bfqq_created = *source_bfqq;
+
+	/*
+	 * If last_bfqq_created has not been set yet, then init it. If
+	 * it has been set already, but too long ago, then move it
+	 * forward to bfqq. Finally, move also if bfqq belongs to a
+	 * different group than last_bfqq_created, or if bfqq has a
+	 * different ioprio or ioprio_class. If none of these
+	 * conditions holds true, then try an early stable merge or
+	 * schedule a delayed stable merge.
+	 *
+	 * A delayed merge is scheduled (instead of performing an
+	 * early merge), in case bfqq might soon prove to be more
+	 * throughput-beneficial if not merged. Currently this is
+	 * possible only if bfqd is rotational with no queueing. For
+	 * such a drive, not merging bfqq is better for throughput if
+	 * bfqq happens to contain sequential I/O. So, we wait a
+	 * little bit for enough I/O to flow through bfqq. After that,
+	 * if such an I/O is sequential, then the merge is
+	 * canceled. Otherwise the merge is finally performed.
+	 */
+	if (!last_bfqq_created ||
+	    time_before(last_bfqq_created->creation_time +
+			bfqd->bfq_burst_interval,
+			bfqq->creation_time) ||
+		bfqq->entity.parent != last_bfqq_created->entity.parent ||
+		bfqq->ioprio != last_bfqq_created->ioprio ||
+		bfqq->ioprio_class != last_bfqq_created->ioprio_class)
+		*source_bfqq = bfqq;
+	else if (time_after_eq(last_bfqq_created->creation_time +
+				 bfqd->bfq_burst_interval,
+				 bfqq->creation_time)) {
+		if (likely(bfqd->nonrot_with_queueing))
+			/*
+			 * With this type of drive, leaving
+			 * bfqq alone may provide no
+			 * throughput benefits compared with
+			 * merging bfqq. So merge bfqq now.
+			 */
+			bfqq = bfq_do_early_stable_merge(bfqd, bfqq,
+							 bic,
+							 last_bfqq_created);
+		else { /* schedule tentative stable merge */
+			/*
+			 * get reference on last_bfqq_created,
+			 * to prevent it from being freed,
+			 * until we decide whether to merge
+			 */
+			last_bfqq_created->ref++;
+			/*
+			 * need to keep track of stable refs, to
+			 * compute process refs correctly
+			 */
+			last_bfqq_created->stable_ref++;
+			/*
+			 * Record the bfqq to merge to.
+			 */
+			bic->stable_merge_bfqq = last_bfqq_created;
+		}
+	}
+
+	return bfqq;
+}
+
+
 static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
 				       struct bio *bio, bool is_sync,
-				       struct bfq_io_cq *bic)
+				       struct bfq_io_cq *bic,
+				       bool respawn)
 {
 	const int ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
 	const int ioprio_class = IOPRIO_PRIO_CLASS(bic->ioprio);
@@ -5321,7 +5555,10 @@ static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
 
 out:
 	bfqq->ref++; /* get a process reference to this queue */
-	bfq_log_bfqq(bfqd, bfqq, "get_queue, at end: %p, %d", bfqq, bfqq->ref);
+
+	if (bfqq != &bfqd->oom_bfqq && is_sync && !respawn)
+		bfqq = bfq_do_or_sched_stable_merge(bfqd, bfqq, bic);
+
 	rcu_read_unlock();
 	return bfqq;
 }
@@ -5563,7 +5800,8 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 static bool __bfq_insert_request(struct bfq_data *bfqd, struct request *rq)
 {
 	struct bfq_queue *bfqq = RQ_BFQQ(rq),
-		*new_bfqq = bfq_setup_cooperator(bfqd, bfqq, rq, true);
+		*new_bfqq = bfq_setup_cooperator(bfqd, bfqq, rq, true,
+						 RQ_BIC(rq));
 	bool waiting, idle_timer_disabled = false;
 
 	if (new_bfqq) {
@@ -6220,7 +6458,7 @@ static struct bfq_queue *bfq_get_bfqq_handle_split(struct bfq_data *bfqd,
 
 	if (bfqq)
 		bfq_put_queue(bfqq);
-	bfqq = bfq_get_queue(bfqd, bio, is_sync, bic);
+	bfqq = bfq_get_queue(bfqd, bio, is_sync, bic, split);
 
 	bic_set_bfqq(bic, bfqq, is_sync);
 	if (split && is_sync) {
@@ -6341,7 +6579,8 @@ static struct bfq_queue *bfq_init_rq(struct request *rq)
 
 	if (likely(!new_queue)) {
 		/* If the queue was seeky for too long, break it apart. */
-		if (bfq_bfqq_coop(bfqq) && bfq_bfqq_split_coop(bfqq)) {
+		if (bfq_bfqq_coop(bfqq) && bfq_bfqq_split_coop(bfqq) &&
+			!bic->stably_merged) {
 			struct bfq_queue *old_bfqq = bfqq;
 
 			/* Update bic before losing reference to bfqq */
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index b8e793c34ff1..99c2a3cb081e 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -197,6 +197,9 @@ struct bfq_entity {
 
 	/* flag, set if the entity is counted in groups_with_pending_reqs */
 	bool in_groups_with_pending_reqs;
+
+	/* last child queue of entity created (for non-leaf entities) */
+	struct bfq_queue *last_bfqq_created;
 };
 
 struct bfq_group;
@@ -230,6 +233,8 @@ struct bfq_ttime {
 struct bfq_queue {
 	/* reference counter */
 	int ref;
+	/* counter of references from other queues for delayed stable merge */
+	int stable_ref;
 	/* parent bfq_data */
 	struct bfq_data *bfqd;
 
@@ -365,6 +370,8 @@ struct bfq_queue {
 
 	unsigned long first_IO_time; /* time of first I/O for this queue */
 
+	unsigned long creation_time; /* when this queue is created */
+
 	/* max service rate measured so far */
 	u32 max_service_rate;
 
@@ -454,6 +461,11 @@ struct bfq_io_cq {
 	u64 saved_last_serv_time_ns;
 	unsigned int saved_inject_limit;
 	unsigned long saved_decrease_time_jif;
+
+	/* candidate queue for a stable merge (due to close creation time) */
+	struct bfq_queue *stable_merge_bfqq;
+
+	bool stably_merged;	/* non splittable if true */
 };
 
 /**
@@ -578,6 +590,9 @@ struct bfq_data {
 	/* bfqq owning the last completed rq */
 	struct bfq_queue *last_completed_rq_bfqq;
 
+	/* last bfqq created, among those in the root group */
+	struct bfq_queue *last_bfqq_created;
+
 	/* time of last transition from empty to non-empty (ns) */
 	u64 last_empty_occupied_ns;