Message ID | 20240628212350.3577766-1-mcgrof@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] bdev: use bdev_io_min() for statx DIO min IO | expand |
On Fri, Jun 28, 2024 at 02:23:50PM -0700, Luis Chamberlain wrote: > We currently rely on the block device logical block size for the > offset alignment. While this *works* it doesn't work with performance > in mind. That's exactly what the minimum_io_size attribute is for. > > This would for example enhance performance for DIO on 4k IU drives which > have for example an LBA format of 512 bytes for both HDDs and NVMe. > Another use case is to ensure that DIO will be used with 16k IOs on > existing market 16k IU drives with an LBA format of 4k or 512 bytes. The minimum_io_size clearly is the minimum I/O size, not the minimal nice to have one. Changing this will break existing setups.
On Fri, Jun 28, 2024 at 11:25:34PM -0700, Christoph Hellwig wrote: > On Fri, Jun 28, 2024 at 02:23:50PM -0700, Luis Chamberlain wrote: > > We currently rely on the block device logical block size for the > > offset alignment. While this *works* it doesn't work with performance > > in mind. That's exactly what the minimum_io_size attribute is for. > > > > This would for example enhance performance for DIO on 4k IU drives which > > have for example an LBA format of 512 bytes for both HDDs and NVMe. > > Another use case is to ensure that DIO will be used with 16k IOs on > > existing market 16k IU drives with an LBA format of 4k or 512 bytes. > > The minimum_io_size clearly is the minimum I/O size, not the minimal > nice to have one. I may have misread the below documentation then, because it seems to suggest this is a performance parameter, not a real minimum. Do we need to update it? What: /sys/block/<disk>/queue/minimum_io_size Date: April 2009 Contact: Martin K. Petersen <martin.petersen@oracle.com> Description: [RO] Storage devices may report a granularity or preferred minimum I/O size which is the smallest request the device can perform without incurring a performance penalty. For disk drives this is often the physical block size. For RAID arrays it is often the stripe chunk size. A properly aligned multiple of minimum_io_size is the preferred request size for workloads where a high number of I/O operations is desired. If this is not the right place, do we need to use a new topology entry for the IU? Today the NVMe drive uses it for the NPWG which these days is the IU. > Changing this will break existing setups. My impression was that STATX_DIOALIGN was rather new, so any new enhancements due to the difficulties in getting DIO alignment right, this was the right place to do so. Let me know if you have other suggestions. Luis
On Sat, Jun 29, 2024 at 08:24:00PM -0700, Luis Chamberlain wrote: > > The minimum_io_size clearly is the minimum I/O size, not the minimal > > nice to have one. > > I may have misread the below documentation then, because it seems to > suggest this is a performance parameter, not a real minimum. Do we need > to update it? queue_limits.min_io is corretly described and a performance hint. The statx dio_offset_align is actual minimum I/O size and alignment and not in any way related to the performance hint in minimum_io_size.
On Sat, Jun 29, 2024 at 10:54:19PM -0700, Christoph Hellwig wrote: > On Sat, Jun 29, 2024 at 08:24:00PM -0700, Luis Chamberlain wrote: > > > The minimum_io_size clearly is the minimum I/O size, not the minimal > > > nice to have one. > > > > I may have misread the below documentation then, because it seems to > > suggest this is a performance parameter, not a real minimum. Do we need > > to update it? > > queue_limits.min_io is corretly described and a performance hint. OK, great! > The statx dio_offset_align is actual minimum I/O size and alignment and > not in any way related to the performance hint in minimum_io_size. Oh, darn, I just read again 825cf206ed510 ("statx: add direct I/O alignment information") and the block layer change through commit 2d985f8c6b91b ("vfs: support STATX_DIOALIGN on block devices") and no where do I see any mention of it being a min. Should we clarify that? And should we add a respective value for performance? I suspect userspace will want to work with optimal values, not ones which could for instance incur read-modify-write. Altough we have BLKIOMIN to get the optimal performance min IO and BLKIOOPT to get the optimal size it is not terribly clear to me that users know they should prefer to align to BLKIOMIN and use that for an DIO size for writes when possible. Luis
On Sun, Jun 30, 2024 at 01:42:45PM -0700, Luis Chamberlain wrote: > On Sat, Jun 29, 2024 at 10:54:19PM -0700, Christoph Hellwig wrote: > > On Sat, Jun 29, 2024 at 08:24:00PM -0700, Luis Chamberlain wrote: > > > > The minimum_io_size clearly is the minimum I/O size, not the minimal > > > > nice to have one. > > > > > > I may have misread the below documentation then, because it seems to > > > suggest this is a performance parameter, not a real minimum. Do we need > > > to update it? > > > > queue_limits.min_io is corretly described and a performance hint. > > OK, great! > > > The statx dio_offset_align is actual minimum I/O size and alignment and > > not in any way related to the performance hint in minimum_io_size. > > Oh, darn, I just read again 825cf206ed510 ("statx: add direct I/O > alignment information") and the block layer change through commit > 2d985f8c6b91b ("vfs: support STATX_DIOALIGN on block devices") and > no where do I see any mention of it being a min. Should we clarify > that? dio_offset_align is an _alignment_ parameter, not a "size" parameter. In no way does it define either the absolute or "best performance" minimum IO size - it just defines the minimum valid alignment for the file offset that the filesystem/device supports. It is implied that an IO of dio_offset_align bytes in length will be supported, because that is the minimum length IO that meets the offset alignment requirements defined by dio_offset_align. > And should we add a respective value for performance? We could, but we already have: __u32 stx_blksize; /* Block size for filesystem I/O */ which is defined as: stx_blksize The "preferred" block size for efficient filesystem I/O. (Writing to a file in smaller chunks may cause an inefficient read-modify-rewrite.) > I suspect > userspace will want to work with optimal values, not ones which > could for instance incur read-modify-write. Yup, that's the very definition of what stx_blksize should contain. -Dave.
diff --git a/block/bdev.c b/block/bdev.c index 1b4af2cc3b1e..5d0874aa8661 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -1282,7 +1282,7 @@ void bdev_statx(struct inode *backing_inode, struct kstat *stat, if (request_mask & STATX_DIOALIGN) { stat->dio_mem_align = bdev_dma_alignment(bdev) + 1; - stat->dio_offset_align = bdev_logical_block_size(bdev); + stat->dio_offset_align = (unsigned int) bdev_io_min(bdev); stat->result_mask |= STATX_DIOALIGN; }
We currently rely on the block device logical block size for the offset alignment. While this *works* it doesn't work with performance in mind. That's exactly what the minimum_io_size attribute is for. This would for example enhance performance for DIO on 4k IU drives which have for example an LBA format of 512 bytes for both HDDs and NVMe. Another use case is to ensure that DIO will be used with 16k IOs on existing market 16k IU drives with an LBA format of 4k or 512 bytes. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- block/bdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)