diff mbox series

[PATCHv8,3/6] block: introduce max_write_hints queue limit

Message ID 20241017160937.2283225-4-kbusch@meta.com (mailing list archive)
State New, archived
Headers show
Series write hints for nvme fdp | expand

Commit Message

Keith Busch Oct. 17, 2024, 4:09 p.m. UTC
From: Keith Busch <kbusch@kernel.org>

Drivers with hardware that support write hints need a way to export how
many are available so applications can generically query this.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 Documentation/ABI/stable/sysfs-block |  7 +++++++
 block/blk-settings.c                 |  3 +++
 block/blk-sysfs.c                    |  3 +++
 block/fops.c                         |  2 ++
 include/linux/blkdev.h               | 12 ++++++++++++
 5 files changed, 27 insertions(+)

Comments

Christoph Hellwig Oct. 18, 2024, 5:51 a.m. UTC | #1
On Thu, Oct 17, 2024 at 09:09:34AM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Drivers with hardware that support write hints need a way to export how
> many are available so applications can generically query this.

Calling this write hints vs write streams is very confusing.

Otherwise this looks reasonable.
Hannes Reinecke Oct. 18, 2024, 6:01 a.m. UTC | #2
On 10/17/24 18:09, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Drivers with hardware that support write hints need a way to export how
> many are available so applications can generically query this.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>   Documentation/ABI/stable/sysfs-block |  7 +++++++
>   block/blk-settings.c                 |  3 +++
>   block/blk-sysfs.c                    |  3 +++
>   block/fops.c                         |  2 ++
>   include/linux/blkdev.h               | 12 ++++++++++++
>   5 files changed, 27 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
Bart Van Assche Oct. 18, 2024, 4:18 p.m. UTC | #3
On 10/17/24 9:09 AM, Keith Busch wrote:
> Drivers with hardware that support write hints need a way to export how
> many are available so applications can generically query this.

Something is missing from this patch, namely a change for the SCSI disk
(sd) driver that sets max_write_hints to sdkp->permanent_stream_count.

> +What:		/sys/block/<disk>/queue/max_write_hints
> +Date:		October 2024
> +Contact:	linux-block@vger.kernel.org
> +Description:
> +		[RO] Maximum number of write hints supported, 0 if not
> +		supported. If supported, valid values are 1 through
> +		max_write_hints, inclusive.

That's a bit short. I think it would help to add a reference to the
aspects of the standards related to this attribute: permanent streams
for SCSI and FDP for NVMe.

> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index a446654ddee5e..921fb4d334fa4 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -43,6 +43,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
>   	lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK;
>   
>   	/* Inherit limits from component devices */
> +	lim->max_write_hints = USHRT_MAX;
>   	lim->max_segments = USHRT_MAX;
>   	lim->max_discard_segments = USHRT_MAX;
>   	lim->max_hw_sectors = UINT_MAX;
> @@ -544,6 +545,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>   	t->max_segment_size = min_not_zero(t->max_segment_size,
>   					   b->max_segment_size);
>   
> +	t->max_write_hints = min(t->max_write_hints, b->max_write_hints);
> +
>   	alignment = queue_limit_alignment_offset(b, start);
>   

I prefer that lim->max_write_hints is initialized to zero in
blk_set_stacking_limits() and that blk_stack_limits() uses
min_not_zero().

Thanks,

Bart.
Keith Busch Oct. 21, 2024, 3:02 p.m. UTC | #4
On Fri, Oct 18, 2024 at 09:18:34AM -0700, Bart Van Assche wrote:
> On 10/17/24 9:09 AM, Keith Busch wrote:
> > Drivers with hardware that support write hints need a way to export how
> > many are available so applications can generically query this.
> 
> Something is missing from this patch, namely a change for the SCSI disk
> (sd) driver that sets max_write_hints to sdkp->permanent_stream_count.

Shouldn't someone who cares about scsi do that? I certainly don't care,
nor have I been keeping up with what's happening there, so I'm also
unqualified.
 
> > +What:		/sys/block/<disk>/queue/max_write_hints
> > +Date:		October 2024
> > +Contact:	linux-block@vger.kernel.org
> > +Description:
> > +		[RO] Maximum number of write hints supported, 0 if not
> > +		supported. If supported, valid values are 1 through
> > +		max_write_hints, inclusive.
> 
> That's a bit short. I think it would help to add a reference to the
> aspects of the standards related to this attribute: permanent streams
> for SCSI and FDP for NVMe.

The specs regarding write hints have not historically been stable, so
I'd rather not tie kernel docs to volatile external specifications.

> > diff --git a/block/blk-settings.c b/block/blk-settings.c
> > index a446654ddee5e..921fb4d334fa4 100644
> > --- a/block/blk-settings.c
> > +++ b/block/blk-settings.c
> > @@ -43,6 +43,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
> >   	lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK;
> >   	/* Inherit limits from component devices */
> > +	lim->max_write_hints = USHRT_MAX;
> >   	lim->max_segments = USHRT_MAX;
> >   	lim->max_discard_segments = USHRT_MAX;
> >   	lim->max_hw_sectors = UINT_MAX;
> > @@ -544,6 +545,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
> >   	t->max_segment_size = min_not_zero(t->max_segment_size,
> >   					   b->max_segment_size);
> > +	t->max_write_hints = min(t->max_write_hints, b->max_write_hints);
> > +
> >   	alignment = queue_limit_alignment_offset(b, start);
> 
> I prefer that lim->max_write_hints is initialized to zero in
> blk_set_stacking_limits() and that blk_stack_limits() uses
> min_not_zero().

How is a device supposed to report it doesn't support a write hint if 0
gets overridden?
diff mbox series

Patch

diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
index 8353611107154..f2db2cabb8e75 100644
--- a/Documentation/ABI/stable/sysfs-block
+++ b/Documentation/ABI/stable/sysfs-block
@@ -506,6 +506,13 @@  Description:
 		[RO] Maximum size in bytes of a single element in a DMA
 		scatter/gather list.
 
+What:		/sys/block/<disk>/queue/max_write_hints
+Date:		October 2024
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] Maximum number of write hints supported, 0 if not
+		supported. If supported, valid values are 1 through
+		max_write_hints, inclusive.
 
 What:		/sys/block/<disk>/queue/max_segments
 Date:		March 2010
diff --git a/block/blk-settings.c b/block/blk-settings.c
index a446654ddee5e..921fb4d334fa4 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -43,6 +43,7 @@  void blk_set_stacking_limits(struct queue_limits *lim)
 	lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK;
 
 	/* Inherit limits from component devices */
+	lim->max_write_hints = USHRT_MAX;
 	lim->max_segments = USHRT_MAX;
 	lim->max_discard_segments = USHRT_MAX;
 	lim->max_hw_sectors = UINT_MAX;
@@ -544,6 +545,8 @@  int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 	t->max_segment_size = min_not_zero(t->max_segment_size,
 					   b->max_segment_size);
 
+	t->max_write_hints = min(t->max_write_hints, b->max_write_hints);
+
 	alignment = queue_limit_alignment_offset(b, start);
 
 	/* Bottom device has different alignment.  Check that it is
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 741b95dfdbf6f..85f48ca461049 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -104,6 +104,7 @@  QUEUE_SYSFS_LIMIT_SHOW(max_segments)
 QUEUE_SYSFS_LIMIT_SHOW(max_discard_segments)
 QUEUE_SYSFS_LIMIT_SHOW(max_integrity_segments)
 QUEUE_SYSFS_LIMIT_SHOW(max_segment_size)
+QUEUE_SYSFS_LIMIT_SHOW(max_write_hints)
 QUEUE_SYSFS_LIMIT_SHOW(logical_block_size)
 QUEUE_SYSFS_LIMIT_SHOW(physical_block_size)
 QUEUE_SYSFS_LIMIT_SHOW(chunk_sectors)
@@ -457,6 +458,7 @@  QUEUE_RO_ENTRY(queue_max_hw_sectors, "max_hw_sectors_kb");
 QUEUE_RO_ENTRY(queue_max_segments, "max_segments");
 QUEUE_RO_ENTRY(queue_max_integrity_segments, "max_integrity_segments");
 QUEUE_RO_ENTRY(queue_max_segment_size, "max_segment_size");
+QUEUE_RO_ENTRY(queue_max_write_hints, "max_write_hints");
 QUEUE_RW_LOAD_MODULE_ENTRY(elv_iosched, "scheduler");
 
 QUEUE_RO_ENTRY(queue_logical_block_size, "logical_block_size");
@@ -591,6 +593,7 @@  static struct attribute *queue_attrs[] = {
 	&queue_max_discard_segments_entry.attr,
 	&queue_max_integrity_segments_entry.attr,
 	&queue_max_segment_size_entry.attr,
+	&queue_max_write_hints_entry.attr,
 	&queue_hw_sector_size_entry.attr,
 	&queue_logical_block_size_entry.attr,
 	&queue_physical_block_size_entry.attr,
diff --git a/block/fops.c b/block/fops.c
index 85b9b97d372c8..d0b16d3975fd6 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -376,6 +376,8 @@  static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 
 	if (blkdev_dio_invalid(bdev, iocb->ki_pos, iter, is_atomic))
 		return -EINVAL;
+	if (iocb->ki_write_hint > bdev_max_write_hints(bdev))
+		return -EINVAL;
 
 	nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1);
 	if (likely(nr_pages <= BIO_MAX_VECS)) {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6b78a68e0bd9c..01aba0ffeff6e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -393,6 +393,8 @@  struct queue_limits {
 	unsigned short		max_integrity_segments;
 	unsigned short		max_discard_segments;
 
+	unsigned short		max_write_hints;
+
 	unsigned int		max_open_zones;
 	unsigned int		max_active_zones;
 
@@ -1183,6 +1185,11 @@  static inline unsigned short queue_max_segments(const struct request_queue *q)
 	return q->limits.max_segments;
 }
 
+static inline unsigned short queue_max_write_hints(struct request_queue *q)
+{
+	return q->limits.max_write_hints;
+}
+
 static inline unsigned short queue_max_discard_segments(const struct request_queue *q)
 {
 	return q->limits.max_discard_segments;
@@ -1230,6 +1237,11 @@  static inline unsigned int bdev_max_segments(struct block_device *bdev)
 	return queue_max_segments(bdev_get_queue(bdev));
 }
 
+static inline unsigned short bdev_max_write_hints(struct block_device *bdev)
+{
+	return queue_max_write_hints(bdev_get_queue(bdev));
+}
+
 static inline unsigned queue_logical_block_size(const struct request_queue *q)
 {
 	return q->limits.logical_block_size;