[PATCHv2] block: enable passthrough command statistics

Message ID	20241003153036.411721-1-kbusch@meta.com (mailing list archive)
State	New, archived
Headers	show Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 008B91A0BF6 for <linux-block@vger.kernel.org>; Thu, 3 Oct 2024 15:30:44 +0000 (UTC) From: Keith Busch <kbusch@meta.com> To: <linux-block@vger.kernel.org>, <axboe@kernel.dk> CC: <hch@lst.de>, Keith Busch <kbusch@kernel.org> Subject: [PATCHv2] block: enable passthrough command statistics Date: Thu, 3 Oct 2024 08:30:36 -0700 Message-ID: <20241003153036.411721-1-kbusch@meta.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain
Series	[PATCHv2] block: enable passthrough command statistics \| expand [PATCHv2] block: enable passthrough command statistics

Keith Busch Oct. 3, 2024, 3:30 p.m. UTC

From: Keith Busch <kbusch@kernel.org>

Applications using the passthrough interfaces for IO want to continue
seeing the disk stats. These requests had been fenced off from this
block layer feature. While the block layer doesn't necessarily know what
a passthrough command does, we do know the data size and direction,
which is enough to account for the command's stats.

Since tracking these has the potential to produce unexpected results,
the passthrough stats are locked behind a new queue flag that needs to
be enabled with the /sys/block/<dev>/queue/iostats_passthrough
attribute.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
v1->v2:

  Moved the passthrough stats enabling to its own attribute instead of
  overloading the existing iostats

  Added comments to the criteria for allowing passthrough stats

Based on block for-6.13/block tree:

  https://git.kernel.dk/cgit/linux-block/log/?h=for-6.13/block

 Documentation/ABI/stable/sysfs-block |  7 +++++++
 block/blk-mq-debugfs.c               |  1 +
 block/blk-mq.c                       | 27 ++++++++++++++++++++++++++-
 block/blk-sysfs.c                    | 26 ++++++++++++++++++++++++++
 include/linux/blkdev.h               |  3 +++
 5 files changed, 63 insertions(+), 1 deletion(-)

Jens Axboe Oct. 3, 2024, 3:40 p.m. UTC | #1

On 10/3/24 9:30 AM, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Applications using the passthrough interfaces for IO want to continue
> seeing the disk stats. These requests had been fenced off from this
> block layer feature. While the block layer doesn't necessarily know what
> a passthrough command does, we do know the data size and direction,
> which is enough to account for the command's stats.
> 
> Since tracking these has the potential to produce unexpected results,
> the passthrough stats are locked behind a new queue flag that needs to
> be enabled with the /sys/block/<dev>/queue/iostats_passthrough
> attribute.

Looks good to me.

Jens Axboe Oct. 3, 2024, 6:09 p.m. UTC | #2

On Thu, 03 Oct 2024 08:30:36 -0700, Keith Busch wrote:
> Applications using the passthrough interfaces for IO want to continue
> seeing the disk stats. These requests had been fenced off from this
> block layer feature. While the block layer doesn't necessarily know what
> a passthrough command does, we do know the data size and direction,
> which is enough to account for the command's stats.
> 
> Since tracking these has the potential to produce unexpected results,
> the passthrough stats are locked behind a new queue flag that needs to
> be enabled with the /sys/block/<dev>/queue/iostats_passthrough
> attribute.
> 
> [...]

Applied, thanks!

[1/1] block: enable passthrough command statistics
      commit: 663db31a86bc7da797ec62f301ef0d6058ff0721

Best regards,

Christoph Hellwig Oct. 4, 2024, 5:38 a.m. UTC | #3

On Thu, Oct 03, 2024 at 08:30:36AM -0700, Keith Busch wrote:
> +What:		/sys/block/<disk>/queue/iostats_passthrough
> +Date:		October 2024
> +Contact:	linux-block@vger.kernel.org
> +Description:
> +		[RW] This file is used to control (on/off) the iostats
> +		accounting of the disk for passthrough commands.
> +
>  
>  What:		/sys/block/<disk>/queue/logical_block_size
>  Date:		May 2009
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index 5463697a84428..d9d7fd441297e 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -93,6 +93,7 @@ static const char *const blk_queue_flag_name[] = {
>  	QUEUE_FLAG_NAME(RQ_ALLOC_TIME),
>  	QUEUE_FLAG_NAME(HCTX_ACTIVE),
>  	QUEUE_FLAG_NAME(SQ_SCHED),
> +	QUEUE_FLAG_NAME(IOSTATS_PASSTHROUGH),
>  };
>  #undef QUEUE_FLAG_NAME
>  
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 8e75e3471ea58..cf309b39bac04 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -993,13 +993,38 @@ static inline void blk_account_io_done(struct request *req, u64 now)
>  	}
>  }
>  
> +static inline bool blk_rq_passthrough_stats(struct request *req)
> +{
> +	struct bio *bio = req->bio;
> +
> +	if (!blk_queue_passthrough_stat(req->q))
> +		return false;
> +
> +	/*
> +	 * Stats are accumulated in the bdev part, so must have one attached to
> +	 * a bio to do this
> +	 */
> +	if (!bio)
> +		return false;
> +	if (!bio->bi_bdev)
> +		return false;

Missing. At the end of the sentence.  But even then this doesn't
explain why not accouting these requests is fine.

 - requests without a bio are all those that don't transfer data
 - requests with a bio but not bdev are almost all passthrough requests
   as far as I can tell, with the only exception of nvme I/O command
   passthrough.

I.e. what we have here is a special casing for nvme I/O commands.  Maybe
that's fine, but the comments and commit log should leave a clearly
visible trace of that and not confuse the hell out of people trying to
understand the logic later.

> +	/*
> +	 * Ensuring the size is aligned to the block size prevents observing an
> +	 * invalid sectors stat.
> +	 */
> +	if (blk_rq_bytes(req) & (bdev_logical_block_size(bio->bi_bdev) - 1))
> +		return false;

Now this probably won't trigger anyway for the usual workload (although
it might for odd NVMe command sets like KV and the SLM), but I'd expect the
size to be rounded (probably up?) and not entirely dropped.

> +	ret = queue_var_store(&ios, page, count);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (ios)
> +		blk_queue_flag_set(QUEUE_FLAG_IOSTATS_PASSTHROUGH,
> +				   disk->queue);
> +	else
> +		blk_queue_flag_clear(QUEUE_FLAG_IOSTATS_PASSTHROUGH,
> +				     disk->queue);

Why is this using queue flags now?  This isn't really blk-mq internal,
so it should be using queue_limits->flag as pointed out last round.

Keith Busch Oct. 4, 2024, 3:04 p.m. UTC | #4

On Fri, Oct 04, 2024 at 07:38:28AM +0200, Christoph Hellwig wrote: > 
> Missing. At the end of the sentence.  But even then this doesn't
> explain why not accouting these requests is fine.
> 
>  - requests without a bio are all those that don't transfer data
>  - requests with a bio but not bdev are almost all passthrough requests
>    as far as I can tell, with the only exception of nvme I/O command
>    passthrough.
> 
> I.e. what we have here is a special casing for nvme I/O commands.  Maybe
> that's fine, but the comments and commit log should leave a clearly
> visible trace of that and not confuse the hell out of people trying to
> understand the logic later.

Even Jens was a little surprised to find nvme passthrough sets the bio
bi_bdev. I didn't think it was unusual, but sounds like we are doing
something special here.
 
> > +	/*
> > +	 * Ensuring the size is aligned to the block size prevents observing an
> > +	 * invalid sectors stat.
> > +	 */
> > +	if (blk_rq_bytes(req) & (bdev_logical_block_size(bio->bi_bdev) - 1))
> > +		return false;
> 
> Now this probably won't trigger anyway for the usual workload (although
> it might for odd NVMe command sets like KV and the SLM), but I'd expect the
> size to be rounded (probably up?) and not entirely dropped.

This prevents commands with payload sizes that are not representative of
sector access. Examples from NVMe include Copy, Dataset Management, and
all the Reservation commands. The transfer size of those commands are
unlikely to be a block aligned, so it's a simple way to filter them out.
Rounding the payload size up will produce misleading stats, so I think
it's better if they don't get to use the feature.

> > +	ret = queue_var_store(&ios, page, count);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	if (ios)
> > +		blk_queue_flag_set(QUEUE_FLAG_IOSTATS_PASSTHROUGH,
> > +				   disk->queue);
> > +	else
> > +		blk_queue_flag_clear(QUEUE_FLAG_IOSTATS_PASSTHROUGH,
> > +				     disk->queue);
> 
> Why is this using queue flags now?  This isn't really blk-mq internal,
> so it should be using queue_limits->flag as pointed out last round.

So many flags... The atomic limit update seemed overkill for just this
flag, but okay.

Christoph Hellwig Oct. 7, 2024, 5:56 a.m. UTC | #5

On Fri, Oct 04, 2024 at 09:04:13AM -0600, Keith Busch wrote:
> Even Jens was a little surprised to find nvme passthrough sets the bio
> bi_bdev. I didn't think it was unusual, but sounds like we are doing
> something special here.

IIRC it was added to support metadata passthrough, but I'd have to do
a little research to find the details.

> > > +	/*
> > > +	 * Ensuring the size is aligned to the block size prevents observing an
> > > +	 * invalid sectors stat.
> > > +	 */
> > > +	if (blk_rq_bytes(req) & (bdev_logical_block_size(bio->bi_bdev) - 1))
> > > +		return false;
> > 
> > Now this probably won't trigger anyway for the usual workload (although
> > it might for odd NVMe command sets like KV and the SLM), but I'd expect the
> > size to be rounded (probably up?) and not entirely dropped.
> 
> This prevents commands with payload sizes that are not representative of
> sector access. Examples from NVMe include Copy, Dataset Management, and
> all the Reservation commands. The transfer size of those commands are
> unlikely to be a block aligned, so it's a simple way to filter them out.
> Rounding the payload size up will produce misleading stats, so I think
> it's better if they don't get to use the feature.

True.  Please put this into the comments!

> 
> > > +	ret = queue_var_store(&ios, page, count);
> > > +	if (ret < 0)
> > > +		return ret;
> > > +
> > > +	if (ios)
> > > +		blk_queue_flag_set(QUEUE_FLAG_IOSTATS_PASSTHROUGH,
> > > +				   disk->queue);
> > > +	else
> > > +		blk_queue_flag_clear(QUEUE_FLAG_IOSTATS_PASSTHROUGH,
> > > +				     disk->queue);
> > 
> > Why is this using queue flags now?  This isn't really blk-mq internal,
> > so it should be using queue_limits->flag as pointed out last round.
> 
> So many flags... The atomic limit update seemed overkill for just this
> flag, but okay.

I've been slowly working on making q->flags entirely limited to
blk-mq internal state.  We're not quite there yet, but I'd like to
keep up the direction rather than having to fix it up later.

[PATCHv2] block: enable passthrough command statistics

Commit Message

Comments

Patch