Message ID | 20241017160937.2283225-4-kbusch@meta.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | write hints for nvme fdp | expand |
On Thu, Oct 17, 2024 at 09:09:34AM -0700, Keith Busch wrote: > From: Keith Busch <kbusch@kernel.org> > > Drivers with hardware that support write hints need a way to export how > many are available so applications can generically query this. Calling this write hints vs write streams is very confusing. Otherwise this looks reasonable.
On 10/17/24 18:09, Keith Busch wrote: > From: Keith Busch <kbusch@kernel.org> > > Drivers with hardware that support write hints need a way to export how > many are available so applications can generically query this. > > Signed-off-by: Keith Busch <kbusch@kernel.org> > --- > Documentation/ABI/stable/sysfs-block | 7 +++++++ > block/blk-settings.c | 3 +++ > block/blk-sysfs.c | 3 +++ > block/fops.c | 2 ++ > include/linux/blkdev.h | 12 ++++++++++++ > 5 files changed, 27 insertions(+) > Reviewed-by: Hannes Reinecke <hare@suse.de> Cheers, Hannes
On 10/17/24 9:09 AM, Keith Busch wrote: > Drivers with hardware that support write hints need a way to export how > many are available so applications can generically query this. Something is missing from this patch, namely a change for the SCSI disk (sd) driver that sets max_write_hints to sdkp->permanent_stream_count. > +What: /sys/block/<disk>/queue/max_write_hints > +Date: October 2024 > +Contact: linux-block@vger.kernel.org > +Description: > + [RO] Maximum number of write hints supported, 0 if not > + supported. If supported, valid values are 1 through > + max_write_hints, inclusive. That's a bit short. I think it would help to add a reference to the aspects of the standards related to this attribute: permanent streams for SCSI and FDP for NVMe. > diff --git a/block/blk-settings.c b/block/blk-settings.c > index a446654ddee5e..921fb4d334fa4 100644 > --- a/block/blk-settings.c > +++ b/block/blk-settings.c > @@ -43,6 +43,7 @@ void blk_set_stacking_limits(struct queue_limits *lim) > lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK; > > /* Inherit limits from component devices */ > + lim->max_write_hints = USHRT_MAX; > lim->max_segments = USHRT_MAX; > lim->max_discard_segments = USHRT_MAX; > lim->max_hw_sectors = UINT_MAX; > @@ -544,6 +545,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, > t->max_segment_size = min_not_zero(t->max_segment_size, > b->max_segment_size); > > + t->max_write_hints = min(t->max_write_hints, b->max_write_hints); > + > alignment = queue_limit_alignment_offset(b, start); > I prefer that lim->max_write_hints is initialized to zero in blk_set_stacking_limits() and that blk_stack_limits() uses min_not_zero(). Thanks, Bart.
On Fri, Oct 18, 2024 at 09:18:34AM -0700, Bart Van Assche wrote: > On 10/17/24 9:09 AM, Keith Busch wrote: > > Drivers with hardware that support write hints need a way to export how > > many are available so applications can generically query this. > > Something is missing from this patch, namely a change for the SCSI disk > (sd) driver that sets max_write_hints to sdkp->permanent_stream_count. Shouldn't someone who cares about scsi do that? I certainly don't care, nor have I been keeping up with what's happening there, so I'm also unqualified. > > +What: /sys/block/<disk>/queue/max_write_hints > > +Date: October 2024 > > +Contact: linux-block@vger.kernel.org > > +Description: > > + [RO] Maximum number of write hints supported, 0 if not > > + supported. If supported, valid values are 1 through > > + max_write_hints, inclusive. > > That's a bit short. I think it would help to add a reference to the > aspects of the standards related to this attribute: permanent streams > for SCSI and FDP for NVMe. The specs regarding write hints have not historically been stable, so I'd rather not tie kernel docs to volatile external specifications. > > diff --git a/block/blk-settings.c b/block/blk-settings.c > > index a446654ddee5e..921fb4d334fa4 100644 > > --- a/block/blk-settings.c > > +++ b/block/blk-settings.c > > @@ -43,6 +43,7 @@ void blk_set_stacking_limits(struct queue_limits *lim) > > lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK; > > /* Inherit limits from component devices */ > > + lim->max_write_hints = USHRT_MAX; > > lim->max_segments = USHRT_MAX; > > lim->max_discard_segments = USHRT_MAX; > > lim->max_hw_sectors = UINT_MAX; > > @@ -544,6 +545,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, > > t->max_segment_size = min_not_zero(t->max_segment_size, > > b->max_segment_size); > > + t->max_write_hints = min(t->max_write_hints, b->max_write_hints); > > + > > alignment = queue_limit_alignment_offset(b, start); > > I prefer that lim->max_write_hints is initialized to zero in > blk_set_stacking_limits() and that blk_stack_limits() uses > min_not_zero(). How is a device supposed to report it doesn't support a write hint if 0 gets overridden?
diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block index 8353611107154..f2db2cabb8e75 100644 --- a/Documentation/ABI/stable/sysfs-block +++ b/Documentation/ABI/stable/sysfs-block @@ -506,6 +506,13 @@ Description: [RO] Maximum size in bytes of a single element in a DMA scatter/gather list. +What: /sys/block/<disk>/queue/max_write_hints +Date: October 2024 +Contact: linux-block@vger.kernel.org +Description: + [RO] Maximum number of write hints supported, 0 if not + supported. If supported, valid values are 1 through + max_write_hints, inclusive. What: /sys/block/<disk>/queue/max_segments Date: March 2010 diff --git a/block/blk-settings.c b/block/blk-settings.c index a446654ddee5e..921fb4d334fa4 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -43,6 +43,7 @@ void blk_set_stacking_limits(struct queue_limits *lim) lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK; /* Inherit limits from component devices */ + lim->max_write_hints = USHRT_MAX; lim->max_segments = USHRT_MAX; lim->max_discard_segments = USHRT_MAX; lim->max_hw_sectors = UINT_MAX; @@ -544,6 +545,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, t->max_segment_size = min_not_zero(t->max_segment_size, b->max_segment_size); + t->max_write_hints = min(t->max_write_hints, b->max_write_hints); + alignment = queue_limit_alignment_offset(b, start); /* Bottom device has different alignment. Check that it is diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 741b95dfdbf6f..85f48ca461049 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -104,6 +104,7 @@ QUEUE_SYSFS_LIMIT_SHOW(max_segments) QUEUE_SYSFS_LIMIT_SHOW(max_discard_segments) QUEUE_SYSFS_LIMIT_SHOW(max_integrity_segments) QUEUE_SYSFS_LIMIT_SHOW(max_segment_size) +QUEUE_SYSFS_LIMIT_SHOW(max_write_hints) QUEUE_SYSFS_LIMIT_SHOW(logical_block_size) QUEUE_SYSFS_LIMIT_SHOW(physical_block_size) QUEUE_SYSFS_LIMIT_SHOW(chunk_sectors) @@ -457,6 +458,7 @@ QUEUE_RO_ENTRY(queue_max_hw_sectors, "max_hw_sectors_kb"); QUEUE_RO_ENTRY(queue_max_segments, "max_segments"); QUEUE_RO_ENTRY(queue_max_integrity_segments, "max_integrity_segments"); QUEUE_RO_ENTRY(queue_max_segment_size, "max_segment_size"); +QUEUE_RO_ENTRY(queue_max_write_hints, "max_write_hints"); QUEUE_RW_LOAD_MODULE_ENTRY(elv_iosched, "scheduler"); QUEUE_RO_ENTRY(queue_logical_block_size, "logical_block_size"); @@ -591,6 +593,7 @@ static struct attribute *queue_attrs[] = { &queue_max_discard_segments_entry.attr, &queue_max_integrity_segments_entry.attr, &queue_max_segment_size_entry.attr, + &queue_max_write_hints_entry.attr, &queue_hw_sector_size_entry.attr, &queue_logical_block_size_entry.attr, &queue_physical_block_size_entry.attr, diff --git a/block/fops.c b/block/fops.c index 85b9b97d372c8..d0b16d3975fd6 100644 --- a/block/fops.c +++ b/block/fops.c @@ -376,6 +376,8 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter) if (blkdev_dio_invalid(bdev, iocb->ki_pos, iter, is_atomic)) return -EINVAL; + if (iocb->ki_write_hint > bdev_max_write_hints(bdev)) + return -EINVAL; nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1); if (likely(nr_pages <= BIO_MAX_VECS)) { diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 6b78a68e0bd9c..01aba0ffeff6e 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -393,6 +393,8 @@ struct queue_limits { unsigned short max_integrity_segments; unsigned short max_discard_segments; + unsigned short max_write_hints; + unsigned int max_open_zones; unsigned int max_active_zones; @@ -1183,6 +1185,11 @@ static inline unsigned short queue_max_segments(const struct request_queue *q) return q->limits.max_segments; } +static inline unsigned short queue_max_write_hints(struct request_queue *q) +{ + return q->limits.max_write_hints; +} + static inline unsigned short queue_max_discard_segments(const struct request_queue *q) { return q->limits.max_discard_segments; @@ -1230,6 +1237,11 @@ static inline unsigned int bdev_max_segments(struct block_device *bdev) return queue_max_segments(bdev_get_queue(bdev)); } +static inline unsigned short bdev_max_write_hints(struct block_device *bdev) +{ + return queue_max_write_hints(bdev_get_queue(bdev)); +} + static inline unsigned queue_logical_block_size(const struct request_queue *q) { return q->limits.logical_block_size;