From patchwork Thu Dec 28 17:57:41 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Michael Lyle <mlyle@lyle.org>
X-Patchwork-Id: 10135519
Return-Path: <linux-block-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	6B9F360388 for <patchwork-linux-block@patchwork.kernel.org>;
	Thu, 28 Dec 2017 17:59:10 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5C09B2D916
	for <patchwork-linux-block@patchwork.kernel.org>;
	Thu, 28 Dec 2017 17:59:10 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 50A712D920; Thu, 28 Dec 2017 17:59:10 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8E0652D916
	for <patchwork-linux-block@patchwork.kernel.org>;
	Thu, 28 Dec 2017 17:59:09 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S932096AbdL1R7G (ORCPT
	<rfc822;patchwork-linux-block@patchwork.kernel.org>);
	Thu, 28 Dec 2017 12:59:06 -0500
Received: from mail-pf0-f195.google.com ([209.85.192.195]:45372 "EHLO
	mail-pf0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1754726AbdL1R5w (ORCPT
	<rfc822;linux-block@vger.kernel.org>);
	Thu, 28 Dec 2017 12:57:52 -0500
Received: by mail-pf0-f195.google.com with SMTP id u19so21189173pfa.12
	for <linux-block@vger.kernel.org>;
	Thu, 28 Dec 2017 09:57:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=lyle-org.20150623.gappssmtp.com; s=20150623;
	h=from:to:cc:subject:date:message-id;
	bh=2R7lxrJGzJ/imLl0aX1UFKObZ2SfqGX2j+x11VjZE/Y=;
	b=HLGNjT3+wBs5HDFMb0DDuXyulaRGpIH12owKJuACLtQZMSoeQ+nkB8JFwkUKsbUTpK
	N9OUY8ygjvsCIQKSbg18OBXn3LNRQTGt6rlJ6/vHgZO6SSgyX46myxiGeGicTpJPx+oK
	gqCz/H2L120Vlp3Lv6/S1UIcr3WO3LIV57v/XtXxGEKQQXlKaRMs+36EsmoiveMTWk/p
	+f2DLcwCiNj8ADtQxSKHq3oKnfiZ0zerXVis2cT5JwyBLczCf253bZ+q1mZyHmkaFvG8
	6bPLppu7CJ/gZPpzYJNrcJ3U8Qm6QRvBIW2GUFhl8l0vjCRLxBGLjZ//9NExTfdjKfVE
	kwDQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:from:to:cc:subject:date:message-id;
	bh=2R7lxrJGzJ/imLl0aX1UFKObZ2SfqGX2j+x11VjZE/Y=;
	b=jMx4lMhL/wUIRZT4UQ07sx2Dzj/iduUjAj0vYl5V5MxTDl2CGDa5bTz4CDoudcseCb
	CdUI+F9ejuD5LHgk6BA4un0e2DcMHOkuB59GVdiZhp8sskOK9c3WmkxQ+Oy56xadZib0
	ZNT7f1+zbxxZvHsv0CRXWxze00PCyNmOzhRgrmxB97Lmt5wu89jZPM7By624OIoKko1Y
	hWiYBLsLF4HV8chFozeVJzX4yNITKMxbRkBTAW3xNpkTB1WEL26rgja0MT3bzwcya5F4
	ZDCSfR7tSysYJw3cN+OtIga7jB0RDBLd55jdqQU0OwG9bf5HkTkjVlbkOsYBuxOFGIUR
	/tkA==
X-Gm-Message-State: AKGB3mJIK8CYdz26jG8DgUGS8QxKV0Mw8GcrjsDWN6cj1XVui2xZAw8G
	JJWiy7Q4S+pnsEBrL9pHDgn8sB9Z
X-Google-Smtp-Source: 
 ACJfBosRO1D5ivSF/yQ9QswWdCm3v6ijgFDi7ukoA6hxm6Uvpw7SZHHA5VFFAy70yxtu1G874FlO3A==
X-Received: by 10.98.66.77 with SMTP id p74mr32094210pfa.183.1514483871413;
	Thu, 28 Dec 2017 09:57:51 -0800 (PST)
Received: from midnight.lan
	(2600-6c52-6200-383d-a0f8-4aea-fac9-9f39.dhcp6.chtrptr.net.
	[2600:6c52:6200:383d:a0f8:4aea:fac9:9f39])
	by smtp.gmail.com with ESMTPSA id
	n65sm71770582pfa.83.2017.12.28.09.57.49
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Thu, 28 Dec 2017 09:57:50 -0800 (PST)
From: Michael Lyle <mlyle@lyle.org>
To: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org
Cc: Michael Lyle <mlyle@lyle.org>
Subject: [for-416 PATCH v2] bcache: writeback: collapse contiguous IO better
Date: Thu, 28 Dec 2017 09:57:41 -0800
Message-Id: <20171228175741.15453-1-mlyle@lyle.org>
X-Mailer: git-send-email 2.14.1
Sender: linux-block-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-block.vger.kernel.org>
X-Mailing-List: linux-block@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Previously, there was some logic that attempted to immediately issue
writeback of backing-contiguous blocks when the writeback rate was
fast.

The previous logic did not have any limits on the aggregate size it
would issue, nor the number of keys it would combine at once.  It
would also discard the chance to do a contiguous write when the
writeback rate was low-- e.g. at "background" writeback of target
rate = 8, it would not combine two adjacent 4k writes and would
instead seek the disk twice.

This patch imposes limits and explicitly understands the size of
contiguous I/O during issue.  It also will combine contiguous I/O
in all circumstances, not just when writeback is requested to be
relatively fast.

It is a win on its own, but also lays the groundwork for skip writes to
short keys to make the I/O more sequential/contiguous.  It also gets
ready to start using blk_*_plug, and to allow issuing of non-contig
I/O in parallel if requested by the user (to make use of disk
throughput benefits available from higher queue depths).

This patch fixes a previous version where the contiguous information
was not calculated properly, thanks to suggestions by Tang Junhui.

Signed-off-by: Michael Lyle <mlyle@lyle.org>
---
 drivers/md/bcache/bcache.h    |   6 ---
 drivers/md/bcache/writeback.c | 120 +++++++++++++++++++++++++++++-------------
 drivers/md/bcache/writeback.h |   3 ++
 3 files changed, 85 insertions(+), 44 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 843877e017e1..1784e50eb857 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -323,12 +323,6 @@ struct cached_dev {
 	struct bch_ratelimit	writeback_rate;
 	struct delayed_work	writeback_rate_update;
 
-	/*
-	 * Internal to the writeback code, so read_dirty() can keep track of
-	 * where it's at.
-	 */
-	sector_t		last_read;
-
 	/* Limit number of writeback bios in flight */
 	struct semaphore	in_flight;
 	struct task_struct	*writeback_thread;
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index f3d680c907ae..6975003454d9 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -251,7 +251,9 @@ static void read_dirty_submit(struct closure *cl)
 static void read_dirty(struct cached_dev *dc)
 {
 	unsigned delay = 0;
-	struct keybuf_key *w;
+	struct keybuf_key *next, *keys[MAX_WRITEBACKS_IN_PASS], *w;
+	size_t size;
+	int nk, i;
 	struct dirty_io *io;
 	struct closure cl;
 
@@ -262,45 +264,87 @@ static void read_dirty(struct cached_dev *dc)
 	 * mempools.
 	 */
 
-	while (!kthread_should_stop()) {
-
-		w = bch_keybuf_next(&dc->writeback_keys);
-		if (!w)
-			break;
-
-		BUG_ON(ptr_stale(dc->disk.c, &w->key, 0));
-
-		if (KEY_START(&w->key) != dc->last_read ||
-		    jiffies_to_msecs(delay) > 50)
-			while (!kthread_should_stop() && delay)
-				delay = schedule_timeout_interruptible(delay);
-
-		dc->last_read	= KEY_OFFSET(&w->key);
-
-		io = kzalloc(sizeof(struct dirty_io) + sizeof(struct bio_vec)
-			     * DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS),
-			     GFP_KERNEL);
-		if (!io)
-			goto err;
-
-		w->private	= io;
-		io->dc		= dc;
-
-		dirty_init(w);
-		bio_set_op_attrs(&io->bio, REQ_OP_READ, 0);
-		io->bio.bi_iter.bi_sector = PTR_OFFSET(&w->key, 0);
-		bio_set_dev(&io->bio, PTR_CACHE(dc->disk.c, &w->key, 0)->bdev);
-		io->bio.bi_end_io	= read_dirty_endio;
-
-		if (bio_alloc_pages(&io->bio, GFP_KERNEL))
-			goto err_free;
-
-		trace_bcache_writeback(&w->key);
+	next = bch_keybuf_next(&dc->writeback_keys);
+
+	while (!kthread_should_stop() && next) {
+		size = 0;
+		nk = 0;
+
+		do {
+			BUG_ON(ptr_stale(dc->disk.c, &next->key, 0));
+
+			/*
+			 * Don't combine too many operations, even if they
+			 * are all small.
+			 */
+			if (nk >= MAX_WRITEBACKS_IN_PASS)
+				break;
+
+			/*
+			 * If the current operation is very large, don't
+			 * further combine operations.
+			 */
+			if (size >= MAX_WRITESIZE_IN_PASS)
+				break;
+
+			/*
+			 * Operations are only eligible to be combined
+			 * if they are contiguous.
+			 *
+			 * TODO: add a heuristic willing to fire a
+			 * certain amount of non-contiguous IO per pass,
+			 * so that we can benefit from backing device
+			 * command queueing.
+			 */
+			if ((nk != 0) && bkey_cmp(&keys[nk-1]->key,
+						&START_KEY(&next->key)))
+				break;
+
+			size += KEY_SIZE(&next->key);
+			keys[nk++] = next;
+		} while ((next = bch_keybuf_next(&dc->writeback_keys)));
+
+		/* Now we have gathered a set of 1..5 keys to write back. */
+		for (i = 0; i < nk; i++) {
+			w = keys[i];
+
+			io = kzalloc(sizeof(struct dirty_io) +
+				     sizeof(struct bio_vec) *
+				     DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS),
+				     GFP_KERNEL);
+			if (!io)
+				goto err;
+
+			w->private	= io;
+			io->dc		= dc;
+
+			dirty_init(w);
+			bio_set_op_attrs(&io->bio, REQ_OP_READ, 0);
+			io->bio.bi_iter.bi_sector = PTR_OFFSET(&w->key, 0);
+			bio_set_dev(&io->bio,
+				    PTR_CACHE(dc->disk.c, &w->key, 0)->bdev);
+			io->bio.bi_end_io	= read_dirty_endio;
+
+			if (bio_alloc_pages(&io->bio, GFP_KERNEL))
+				goto err_free;
+
+			trace_bcache_writeback(&w->key);
+
+			down(&dc->in_flight);
+
+			/* We've acquired a semaphore for the maximum
+			 * simultaneous number of writebacks; from here
+			 * everything happens asynchronously.
+			 */
+			closure_call(&io->cl, read_dirty_submit, NULL, &cl);
+		}
 
-		down(&dc->in_flight);
-		closure_call(&io->cl, read_dirty_submit, NULL, &cl);
+		delay = writeback_delay(dc, size);
 
-		delay = writeback_delay(dc, KEY_SIZE(&w->key));
+		while (!kthread_should_stop() && delay) {
+			schedule_timeout_interruptible(delay);
+			delay = writeback_delay(dc, 0);
+		}
 	}
 
 	if (0) {
diff --git a/drivers/md/bcache/writeback.h b/drivers/md/bcache/writeback.h
index a9e3ffb4b03c..6d26927267f8 100644
--- a/drivers/md/bcache/writeback.h
+++ b/drivers/md/bcache/writeback.h
@@ -5,6 +5,9 @@
 #define CUTOFF_WRITEBACK	40
 #define CUTOFF_WRITEBACK_SYNC	70
 
+#define MAX_WRITEBACKS_IN_PASS  5
+#define MAX_WRITESIZE_IN_PASS   5000	/* *512b */
+
 static inline uint64_t bcache_dev_sectors_dirty(struct bcache_device *d)
 {
 	uint64_t i, ret = 0;