From patchwork Wed Dec 27 12:15:07 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Paolo Valente <paolo.valente@linaro.org>
X-Patchwork-Id: 10133607
Return-Path: <linux-block-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	DEDED60388 for <patchwork-linux-block@patchwork.kernel.org>;
	Wed, 27 Dec 2017 12:15:48 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CE4B62DA28
	for <patchwork-linux-block@patchwork.kernel.org>;
	Wed, 27 Dec 2017 12:15:48 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id C17B02DA42; Wed, 27 Dec 2017 12:15:48 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.0 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID, DKIM_VALID_AU,
	RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4B9642DA28
	for <patchwork-linux-block@patchwork.kernel.org>;
	Wed, 27 Dec 2017 12:15:48 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751115AbdL0MPf (ORCPT
	<rfc822;patchwork-linux-block@patchwork.kernel.org>);
	Wed, 27 Dec 2017 07:15:35 -0500
Received: from mail-wr0-f196.google.com ([209.85.128.196]:34343 "EHLO
	mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751257AbdL0MPX (ORCPT
	<rfc822;linux-block@vger.kernel.org>);
	Wed, 27 Dec 2017 07:15:23 -0500
Received: by mail-wr0-f196.google.com with SMTP id 36so3319894wrh.1
	for <linux-block@vger.kernel.org>;
	Wed, 27 Dec 2017 04:15:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google;
	h=from:to:cc:subject:date:message-id:in-reply-to:references;
	bh=LmPi9EV+X0tQdeIDjCwGX1FXGTY+52ixsBa61AAFsfY=;
	b=iLidzMfsocMxZXGFx5zS0uBQbfVWdJcaK+0TLc335qYj+z+WjTBhS0hretUQexpp4U
	SsRZpsWyMVxTPq5b6Zs3E9hKVUEqJWgq1r/Z6UlfC4uOGF7XDA6tiFhPmes13MpovJTF
	M5ifMISYsJN1jFa6+2mX6bNGsdZ9FvB4X+47o=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references;
	bh=LmPi9EV+X0tQdeIDjCwGX1FXGTY+52ixsBa61AAFsfY=;
	b=mpvF50zfDg3mOAWpncvaGHdgZJ+Og1aoQ/d0vhBT1eigHoDgkWPWjvAPcVYLwp2iFD
	nKJCdqQ9n3DwMJaktpwfBei7Cjnf5rj9aYZw12LI4l8myeTIjnn41SupWkglMhxJEocj
	YCdswY/7XFA4Eu6bVffl/RzZiWWP+BJv7c7OWB9JKrWQ//N8fh34U5ZdpziTcksdDAct
	/CPXw+Zf+YjTHC9nObHM1uLZnCa0R7OHnCfJRxpCEPIz0B+o8N+e/gm79hpVXfFqXmRd
	76wbSy5j/GOFVgEJO3mBII1WvkBWrcCayJq74D0zUexJpPMTqrHpAdC4+h1CFBk2FnZb
	YR3w==
X-Gm-Message-State: AKGB3mIj2iEi+Ny5zqmlUSLm8Afw0NxCuwt4JozmRdkbYWXgdjMV7K7R
	47xqJo1fl7kj7/P7OMTIgL/pLA==
X-Google-Smtp-Source: 
 ACJfBosrEvnxVkRcreB0zvE0RTGgi/EQI2jN5sdIvfQO4p8hGKW0ZbwN+KqpCCD2oMrVajj8vFIzfQ==
X-Received: by 10.223.139.152 with SMTP id
	o24mr30547865wra.243.1514376922233;
	Wed, 27 Dec 2017 04:15:22 -0800 (PST)
Received: from localhost.localdomain (146-241-25-197.dyn.eolo.it.
	[146.241.25.197]) by smtp.gmail.com with ESMTPSA id
	i47sm6387297wra.97.2017.12.27.04.15.21
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Wed, 27 Dec 2017 04:15:21 -0800 (PST)
From: Paolo Valente <paolo.valente@linaro.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	ulf.hansson@linaro.org, broonie@kernel.org,
	linus.walleij@linaro.org, bfq-iosched@googlegroups.com,
	Paolo Valente <paolo.valente@linaro.org>
Subject: [PATCH IMPROVEMENT/BUGFIX 1/1] block,
	bfq: limit tags for writes and async I/O
Date: Wed, 27 Dec 2017 13:15:07 +0100
Message-Id: <20171227121507.4280-2-paolo.valente@linaro.org>
X-Mailer: git-send-email 2.15.1
In-Reply-To: <20171227121507.4280-1-paolo.valente@linaro.org>
References: <20171227121507.4280-1-paolo.valente@linaro.org>
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Asynchronous I/O can easily starve synchronous I/O (both sync reads
and sync writes), by consuming all request tags. Similarly, storms of
synchronous writes, such as those that sync(2) may trigger, can starve
synchronous reads. In their turn, these two problems may also cause
BFQ to loose control on latency for interactive and soft real-time
applications. For example, on a PLEXTOR PX-256M5S SSD, LibreOffice
Writer takes 0.6 seconds to start if the device is idle, but it takes
more than 45 seconds (!) if there are sequential writes in the
background.

This commit addresses this issue by limiting the maximum percentage of
tags that asynchronous I/O requests and synchronous write requests can
consume. In particular, this commit grants a higher threshold to
synchronous writes, to prevent the latter from being starved by
asynchronous I/O.

According to the above test, LibreOffice Writer now starts in about
1.2 seconds on average, regardless of the background workload, and
apart from some rare outlier. To check this improvement, run, e.g.,
sudo ./comm_startup_lat.sh bfq 5 5 seq 10 "lowriter --terminate_after_init"
for the comm_startup_lat benchmark in the S suite [1].

[1] https://github.com/Algodev-github/S

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
---
 block/bfq-iosched.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/bfq-iosched.h | 12 +++++++++
 2 files changed, 89 insertions(+)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index e33c5c4c9856..6f75015d18c0 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -417,6 +417,82 @@ static struct request *bfq_choose_req(struct bfq_data *bfqd,
 	}
 }
 
+/*
+ * See the comments on bfq_limit_depth for the purpose of
+ * the depths set in the function.
+ */
+static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
+{
+	bfqd->sb_shift = bt->sb.shift;
+
+	/*
+	 * In-word depths if no bfq_queue is being weight-raised:
+	 * leaving 25% of tags only for sync reads.
+	 *
+	 * In next formulas, right-shift the value
+	 * (1U<<bfqd->sb_shift), instead of computing directly
+	 * (1U<<(bfqd->sb_shift - something)), to be robust against
+	 * any possible value of bfqd->sb_shift, without having to
+	 * limit 'something'.
+	 */
+	/* no more than 50% of tags for async I/O */
+	bfqd->word_depths[0][0] = max((1U<<bfqd->sb_shift)>>1, 1U);
+	/*
+	 * no more than 75% of tags for sync writes (25% extra tags
+	 * w.r.t. async I/O, to prevent async I/O from starving sync
+	 * writes)
+	 */
+	bfqd->word_depths[0][1] = max(((1U<<bfqd->sb_shift) * 3)>>2, 1U);
+
+	/*
+	 * In-word depths in case some bfq_queue is being weight-
+	 * raised: leaving ~63% of tags for sync reads. This is the
+	 * highest percentage for which, in our tests, application
+	 * start-up times didn't suffer from any regression due to tag
+	 * shortage.
+	 */
+	/* no more than ~18% of tags for async I/O */
+	bfqd->word_depths[1][0] = max(((1U<<bfqd->sb_shift) * 3)>>4, 1U);
+	/* no more than ~37% of tags for sync writes (~20% extra tags) */
+	bfqd->word_depths[1][1] = max(((1U<<bfqd->sb_shift) * 6)>>4, 1U);
+}
+
+/*
+ * Async I/O can easily starve sync I/O (both sync reads and sync
+ * writes), by consuming all tags. Similarly, storms of sync writes,
+ * such as those that sync(2) may trigger, can starve sync reads.
+ * Limit depths of async I/O and sync writes so as to counter both
+ * problems.
+ */
+static void bfq_limit_depth(unsigned int op, struct blk_mq_alloc_data *data)
+{
+	struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
+	struct bfq_data *bfqd = data->q->elevator->elevator_data;
+	struct sbitmap_queue *bt;
+
+	if (op_is_sync(op) && !op_is_write(op))
+		return;
+
+	if (data->flags & BLK_MQ_REQ_RESERVED) {
+		if (unlikely(!tags->nr_reserved_tags)) {
+			WARN_ON_ONCE(1);
+			return;
+		}
+		bt = &tags->breserved_tags;
+	} else
+		bt = &tags->bitmap_tags;
+
+	if (unlikely(bfqd->sb_shift != bt->sb.shift))
+		bfq_update_depths(bfqd, bt);
+
+	data->shallow_depth =
+		bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(op)];
+
+	bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u",
+			__func__, bfqd->wr_busy_queues, op_is_sync(op),
+			data->shallow_depth);
+}
+
 static struct bfq_queue *
 bfq_rq_pos_tree_lookup(struct bfq_data *bfqd, struct rb_root *root,
 		     sector_t sector, struct rb_node **ret_parent,
@@ -5265,6 +5341,7 @@ static struct elv_fs_entry bfq_attrs[] = {
 
 static struct elevator_type iosched_bfq_mq = {
 	.ops.mq = {
+		.limit_depth		= bfq_limit_depth,
 		.prepare_request	= bfq_prepare_request,
 		.finish_request		= bfq_finish_request,
 		.exit_icq		= bfq_exit_icq,
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 5d47b58d5fc8..fcd941008127 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -629,6 +629,18 @@ struct bfq_data {
 	struct bfq_io_cq *bio_bic;
 	/* bfqq associated with the task issuing current bio for merging */
 	struct bfq_queue *bio_bfqq;
+
+	/*
+	 * Cached sbitmap shift, used to compute depth limits in
+	 * bfq_update_depths.
+	 */
+	unsigned int sb_shift;
+
+	/*
+	 * Depth limits used in bfq_limit_depth (see comments on the
+	 * function)
+	 */
+	unsigned int word_depths[2][2];
 };
 
 enum bfqq_state_flags {