[3/6] block: cache current nsec time in struct blk_plug

Message ID	20240123173310.1966157-4-axboe@kernel.dk (mailing list archive)
State	New, archived
Headers	show Received: from mail-io1-f51.google.com (mail-io1-f51.google.com [209.85.166.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CC3737FBA1 for <linux-block@vger.kernel.org>; Tue, 23 Jan 2024 17:33:20 +0000 (UTC) From: Jens Axboe <axboe@kernel.dk> To: linux-block@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk>, Johannes Thumshirn <johannes.thumshirn@wdc.com> Subject: [PATCH 3/6] block: cache current nsec time in struct blk_plug Date: Tue, 23 Jan 2024 10:30:35 -0700 Message-ID: <20240123173310.1966157-4-axboe@kernel.dk> In-Reply-To: <20240123173310.1966157-1-axboe@kernel.dk> References: <20240123173310.1966157-1-axboe@kernel.dk> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Cache issue side time querying \| expand [PATCHSET,v4,0/6] Cache issue side time querying [1/6] block: move cgroup time handling code into blkdev.h [2/6] block: add blk_time_get_ns() and blk_time_get() helpers [3/6] block: cache current nsec time in struct blk_plug [4/6] block: update cached timestamp post schedule/preemption [5/6] block: shrink plug->{nr_ios, rq_count} to unsigned char [6/6] block: convert struct blk_plug callback list to hlists

Message ID

20240123173310.1966157-4-axboe@kernel.dk (mailing list archive)

State

New, archived

Headers

From: Jens Axboe <axboe@kernel.dk>
To: linux-block@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>,
	Johannes Thumshirn <johannes.thumshirn@wdc.com>
Subject: [PATCH 3/6] block: cache current nsec time in struct blk_plug
Date: Tue, 23 Jan 2024 10:30:35 -0700
Message-ID: <20240123173310.1966157-4-axboe@kernel.dk>
In-Reply-To: <20240123173310.1966157-1-axboe@kernel.dk>
References: <20240123173310.1966157-1-axboe@kernel.dk>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

Cache issue side time querying | expand

Commit Message

Jens Axboe Jan. 23, 2024, 5:30 p.m. UTC

Querying the current time is the most costly thing we do in the block
layer per IO, and depending on kernel config settings, we may do it
many times per IO.

None of the callers actually need nsec granularity. Take advantage of
that by caching the current time in the plug, with the assumption here
being that any time checking will be temporally close enough that the
slight loss of precision doesn't matter.

If the block plug gets flushed, eg on preempt or schedule out, then
we invalidate the cached clock.

On a basic peak IOPS test case with iostats enabled, this changes
the performance from:

IOPS=108.41M, BW=52.93GiB/s, IOS/call=31/31
IOPS=108.43M, BW=52.94GiB/s, IOS/call=32/32
IOPS=108.29M, BW=52.88GiB/s, IOS/call=31/32
IOPS=108.35M, BW=52.91GiB/s, IOS/call=32/32
IOPS=108.42M, BW=52.94GiB/s, IOS/call=31/31
IOPS=108.40M, BW=52.93GiB/s, IOS/call=32/32
IOPS=108.31M, BW=52.89GiB/s, IOS/call=32/31

to

IOPS=118.79M, BW=58.00GiB/s, IOS/call=31/32
IOPS=118.62M, BW=57.92GiB/s, IOS/call=31/31
IOPS=118.80M, BW=58.01GiB/s, IOS/call=32/31
IOPS=118.78M, BW=58.00GiB/s, IOS/call=32/32
IOPS=118.69M, BW=57.95GiB/s, IOS/call=32/31
IOPS=118.62M, BW=57.92GiB/s, IOS/call=32/31
IOPS=118.63M, BW=57.92GiB/s, IOS/call=31/32

which is more than a 9% improvement in performance. Looking at perf diff,
we can see a huge reduction in time overhead:

    10.55%     -9.88%  [kernel.vmlinux]  [k] read_tsc
     1.31%     -1.22%  [kernel.vmlinux]  [k] ktime_get

Note that since this relies on blk_plug for the caching, it's only
applicable to the issue side. But this is where most of the time calls
happen anyway. On the completion side, cached time stamping is done with
struct io_comp patch, as long as the driver supports it.

It's also worth noting that the above testing doesn't enable any of the
higher cost CPU items on the block layer side, like wbt, cgroups,
iocost, etc, which all would add additional time querying and hence
overhead. IOW, results would likely look even better in comparison with
those enabled, as distros would do.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/blk-core.c       |  1 +
 include/linux/blkdev.h | 15 ++++++++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 11342af420d0..cc4db4d92c75 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1073,6 +1073,7 @@  void blk_start_plug_nr_ios(struct blk_plug *plug, unsigned short nr_ios)
 	if (tsk->plug)
 		return;
 
+	plug->cur_ktime = 0;
 	plug->mq_list = NULL;
 	plug->cached_rq = NULL;
 	plug->nr_ios = min_t(unsigned short, nr_ios, BLK_MAX_REQUEST_COUNT);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 40d3d7da39a4..d59610483619 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -943,6 +943,7 @@  struct blk_plug {
 
 	/* if ios_left is > 1, we can batch tag/rq allocations */
 	struct request *cached_rq;
+	u64 cur_ktime;
 	unsigned short nr_ios;
 
 	unsigned short rq_count;
@@ -978,7 +979,19 @@  long nr_blockdev_pages(void);
 
 static inline u64 blk_time_get_ns(void)
 {
-	return ktime_get_ns();
+	struct blk_plug *plug = current->plug;
+
+	if (!plug)
+		return ktime_get_ns();
+
+	/*
+	 * 0 could very well be a valid time, but rather than flag "this is
+	 * a valid timestamp" separately, just accept that we'll do an extra
+	 * ktime_get_ns() if we just happen to get 0 as the current time.
+	 */
+	if (!plug->cur_ktime)
+		plug->cur_ktime = ktime_get_ns();
+	return plug->cur_ktime;
 }
 
 static inline ktime_t blk_time_get(void)

[3/6] block: cache current nsec time in struct blk_plug

Commit Message

Patch